Analysis of the computational complexity of solving random satisfiability problems 

using branch and bound search algorithms. 
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The computational complexity of solving random 3-Satisfiability (3-SAT) problems is inves- 
tigated. 3-SAT is a representative example of hard computational tasks; it consists in knowing 
whether a set of aN randomly drawn logical constraints involving N Boolean variables can be sat- 
isfied altogether or not. Widely used solving procedures, as the Davis-Putnam-Loveland-Logeman 
(DPLL) algorithm, perform a systematic search for a solution, through a sequence of trials and errors 
represented by a search tree. The size of the search tree accounts for the computational complexity, 
i.e. the amount of computational efforts, required to achieve resolution. In the present study, we 
' identify, using theory and numerical experiments, easy (size of the search tree scaling polynomi- 

ally with N) and hard (exponential scaling) regimes as a function of the ratio a of constraints per 
variable. The complexity is explicitly calculated in the different regimes, in very good agreement 
with numerical simulations. Our theoretical approach is based on the analysis of the growth of the 
branches in the search tree under the operation of DPLL. On each branch, the initial 3-SAT problem 
is dynamically turned into a more generic 2+p-SAT problem, where p and 1 — p are the fractions of 
CO ' constraints involving three and two variables respectively. The growth of each branch is monitored 

by the dynamical evolution of a and p and is represented by a trajectory in the static phase diagram 
of the random 2+p-SAT problem. Depending on whether or not the trajectories cross the boundary 
between satisfiable and unsatisfiable phases, single branches or full trees are generated by DPLL, 
resulting in easy or hard resolutions. Our picture for the origin of complexity can be applied to 
other computational problems solved by branch and bound algorithms. 
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I. INTRODUCTION. 



Out-of-equilibrium dynamical properties of physical systems form the subject of intense studies in modern statistical 
physics Q. Over the past decades, much progress has been made in fields as various as glassy dynamics, growth 
processes, persistence phenomena, vortex depinning ... where dynamical aspects play a central role. Among all the 
questions related to these issues, the existence and characterization of stationary states reached in some asymptotic 
limit of large times is of central importance. In turn, the notion of asymptotic regime raises the question of relaxation, 
or transient, behavior: what time do we need to wait for in order to let the system relax? How does this time grow 
with the size of the system? Such interrogations are not limited to out-of-equilibrium dynamics but also arise in the 
^ i study of critical slowing down phenomena accompanying second order phase transitions. 

Computer science is another scientific discipline where dynamical issues are of central importance. There, the main 
question is to know the time or, more precisely, the amount of computational resources required to solve some given 
computational problem, and how this time increases with the size of the problem to be solved 0. Consider for 
instance the sorting problem ||. One is given a list C of N integer numbers to be sorted in increasing order. What 
is the computational complexity of this task, that is, the minimal number of operations (essentially comparisons) 
necessary to sort any list C of length N? Knuth answered this question in the early seventies: complexity scales at 
least as NlogN and there exists a sorting algorithm, called Mergesort, achieving this lower bound ||. 

To calculate computational complexity, one has to study how the configuration of data representing the compu- 
tational problem dynamically evolves under the prescriptions encoded in the algorithm. Let us consider the sorting 
problem again and think of the initial list £ as a (random) permutation of I = {1,2,..., N}. Starting from £(0) = C, 
at each time step (operation) T, the sorting algorithm transforms the list C(T) into another list C(T + 1), reaching 
finally the identity permutation, i.e. the ordered list X. Obviously, the dynamical rules imposed by a solving algorithm 
are of somewhat unusual nature from a physicist's point of view. They might be highly non-local and non-Markovian. 
Yet, the operation of algorithms gives rise to well posed dynamical problems, to which methods and techniques of 
statistical physics may be applied as we argue in this paper. 
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Unfortunately, not all problems encountered in computer science are as simple as sorting. Many computational 
problems originating from industrial applications, e.g. scheduling, planning and more generally optimization tasks, 
demand computing efforts growing enormously with their size N. For such problems, called NP-complete, all known 
solving algorithms have execution times increasing a priori exponentially with N and it is a fundamental conjecture of 
computer science that no polynomial solving procedure exists. To be more concrete, let us focus on the 3-satisfiability 
(3-SAT) problem, a paradigm of the class of NP-complete computational problems A pedagogical introduction 
to the 3-SAT problem and some of the current open issues in theoretical computer science may be found in Q . 

3-SAT is defined as follows. Consider a set of N Boolean variables and a set of M = aN constraints (called clauses), 
each of which being the logical OR of three variables or of their negation. Then, try to figure out whether there exists 
or not an assignment of variables satisfying all clauses. If such a solution exists, the set of clauses (called instance of 
the 3-SAT problem) is said satisfiable (sat); otherwise the instance is unsatisfiable (unsat). To solve a 3-SAT instance, 
i.e. to know whether it is sat or unsat, one usually resorts to search algorithms, as the ubiquitous Davis-Putnam- 
Loveland-Logemann (DPLL) procedure Q-^J- DPLL operates by trials and errors, the sequence of which can be 
graphically represented as a search tree. Computational complexity is the amount of operations performed by the 
solving algorithm and is conventionally measured by the size of the search tree. 

Complexity may, in practice, vary enormously with the instance of the 3-SAT problem under consideration. To 
understand why instances are easy or hard to solve, computer scientists have focused on model classes of 3-SAT 
instances. Probabilistic models, that define distributions of random instances controlled by few parameters, are 
particularly useful. An example, that has attracted a lot of attention over the past years, is random 3-SAT: all clauses 
are drawn randomly and each variable negated or left unchanged with equal probabilities. Experiments |6|-[l0[ and 



theory 11 12J] indicate that instances are almost surely always sat (respectively unsat) if a is smaller (resp. larger) 
than a critical threshold cip ~ 4.3 as soon as M, N go to infinity at fixed ratio a. This phase transition (l^,[l3j is 
accompanied by a drastic peak of computational hardness at threshold see Figure [j]. Random 3-SAT generates 

simplified and idealized versions of real-world instances. Yet, it reproduces essential features (sat vs. unsat, easy vs. 
hard) and can shed light on the onset of complexity, in the same way as models of condensed matter physics help to 
understand global properties of real materials. 

Phases in random 3-SAT, or in physical systems, characterize the overall static behavior of a sample in the large 
size limit - a large instance with ratio e.g. a = 3 will be almost surely sat (existence proof) - but do not convey direct 
information of dynamical aspects - how long it will take to actually find a solution (constructive proof). This situation 
is reminiscent of the learning problem in neural networks ( "equilibrium" statistical mechanics allows to compute the 
maximal storage capacity, irrespective of the memorization procedure and of the learning time) [ p^[ , or liquids at 
low enough temperatures (that should crystallize from a thermodynamical point of view but undergo some kinetical 
glassy arrest) QQ. 

This paper is an extended version of a previous work Jl7[ |, where we showed how the dynamics induced by the 
DPLL search algorithm could be analyzed using off-equilibrium statistical mechanics and combined to the static 
phase diagram of random K-SAT (with K=2,3) to calculate computational complexity. We start by exposing in a 
detailed way the definition of the random K-SAT problem and the DPLL procedure in Section |j| We then expose the 



experimental measures of complexity in Section III. Our analytical approach is based on the fact that, under DPLL 
action, the initial instance is modified and follows some trajectory in the phase diagram. The structure of the search 
tree generated by DPLL procedure is closely related to the nature of the region visited by the instance trajectory. 



Search trees reduce to essentially one branch - sat instances at low ratio a, section [V - or are dense, including an 
exponential number of branches - unsat instances, section [y]. Mixed structures - sat instances with ratios slightly 



below threshold, section VI - are made of a branch followed by a dense tree and reflect trajectories crossing the 
phase boundary between sat and unsat regimes. While branch trajectories could be obtained straightforwardly from 
previous works by Chao and Franco [ fill , we develop in section [v] a formalism to study the construction of dense trees 
by DPLL. We show that the latter can be reformulated in terms of a (bidimensional) growth process described by a 
non-linear partial differential equation. The resolution of this growth equation allows an analytical prediction of the 



1 The analogy between relaxation in physical systems and computational complexity in combinatorial problems is even clearer 
when the latter are solved using local search algorithms, e.g. simulated annealing or other solving strategies making local 
moves based on some energy function. Consider a random 3-SAT instance below threshold. We define the energy function 
(depending on the configuration of Boolean variables) as the number of unsatisfied clauses |lq |. The goal of the algorithm is 
to find a solution, i.e. to minimize the energy. The configuration of Boolean variables evolve from some initial value to some 
solution under the action of the algorithm. During this evolution, the energy of the "system" relaxes from an initial (large) 
value to zero. Computational complexity is, in this case, equal to the relaxation time of the dynamics before reaching (zero 
temperature) equilibrium. 
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complexity that compares very well to extensive numerical experiments. We present in Section VII the full complexit 
diagram of solving random SA T mo dels, and explain the relationship with static studies of the phase transition [g_3| 
Last of all, we show in Section VIII how our study suggests some possible ways to improve existing algorithms. 



II. DAVIS-PUTNAM-LOVELAND-LOGEMAN ALGORITHM AND RANDOM 3-SAT. 

In this section, the reader will be briefly recalled the main features of the random 3-Satisfiability model. We then 
present the Davis-Putnam-Loveland-Logeman (DPLL) solving procedure, a paradigm of branch and bound algorithm, 
and the notion of search tree. Finally, we introduce the idea of dynamical trajectory, followed by an instance under 
the action of DPLL. 



A. A reminder on random Satisfiability. 

Random K-SAT is defined as follows. Let us consider N Boolean variables Xi that can be either true (T) or false 
(F) (i = 1, . . . , N). We choose randomly K among the N possible indices i and then, for each of them, a literal, that 
is, the corresponding x\ or its negation Xi with equal probabilities one half. A clause C is the logical OR of the K 
previously chosen literals, that is C will be true (or satisfied) if and only if at least one literal is true. Next, we repeat 
this process to obtain M independently chosen clauses {Ci}i = i t ... t M and ask for all of them to be true at the same 
time (i.e. we take the logical AND of the M clauses). The resulting logical formula is called an instance of the K-SAT 
problem. A logical assignment of the Xi's satisfying all clauses, if any, is called a solution of the instance. 

For large instances (M,N — * oo), K-SAT exhibits a striking threshold phenomenon as a function of the ratio 
a = M/N of the number of clauses per variable. Numerical simulations indicate that the probability of finding a 
solution falls abruptly from one down to zero when a crosses a critical value ac(K) 0-0]. Above ac(K), all clauses 
cannot be satisfied any longer. This scenario is rigorously established in the K = 2 case, where etc = 1 [[01 ■ For 
K > 3, much less is known; K(> 3)-SAT belongs to the class of hard, NP-complcte computational problems 
Studies have mainly concentrated on the K = 3 case, whose instances are simpler to generate than for larger values 
of K. Some lower [ p0[ and upper ]2l| ] bounds on ac(3) have been derived, and numerical simulations have recently 
allowed to find precise estimates of etc, e.g. ac(3) ~ 4.3 fllPl- 

The phase transition taking place in random 3-SAT has attracted a large deal of interest over the past years due to 
its close relationship with the emergence of computational complexity. Roughly speaking, instances are much harder 
to solve at threshold than far from criticality We now expose the solving procedure used to tackle the 3-SAT 

problem. 

B. The Davis-Putnam-Loveland-Logeman solving procedure. 

1. Main operations of the solving procedure and search trees. 

3-SAT is among the most difficult problems to solve as its size N becomes large. In practice, one resorts to 
methods that need, a priori, exponentially large computational resources. One of these algorithms, the Davis- 
Putnam-Loveland-Logemann (DPLL) solving procedure , is illustrated on Figure 1 . DPLL operates by trials and 
errors, the sequence of which can be graphically represented as a search tree made of nodes connected through edges 
as follows: 

1. A node corresponds to the choice of a variable. Depending on the value of the latter, DPLL takes one of the 
two possible edges. 

2. Along an edge, all logical implications of the last choice made are extracted. 

3. DPLL goes back to step 1 unless a solution is found or a contradiction arises; in the latter case, DPLL backtracks 
to the closest incomplete node (with a single descendent edge), inverts the attached variable and goes to step 2; 
if all nodes carry two descendent edges, unsatisfiability is proven. 

Examples of search trees for satisfiable (sat) or unsatisfiable (unsat) instances are shown Figure |[ Computational 
complexity is the amount of operations performed by DPLL, and is measured by the size of the search tree, i.e. the 
number of nodes. 
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2. Heuristics of choice. 

In the above procedure, step 1 requires to choose one literal among the variables not assigned yet. The choice of 
the variable and of its value obeys some more or less empirical rules called splitting heuristics. The key idea is to 
choose variables that will lead to the maximum number of logical implications [ p2[ . Here are some simple heuristics: 

• "Truth table" rule: fix unknown variables in lexicographic order, from x\ up to a; at and assign them to e.g. true. 
This is an inefficient rule that does not follow the key principle exposed above. 

• Generalized Unit-Clause (GUC) rule: choose randomly one literal among the shortest clauses plj{] . This is an 
extension of unit-propagation that fixes literal in unitary clauses. GUC is based on the fact that a clause of 
length K needs at most K — 1 splittings to produce a logical implication. So variables are chosen preferentially 
among short clauses. 

• Maximal occurrence in minimum size clauses (MOMS) rule: pick up the literal appearing most often in shortest 
clauses. This rule is a refinement of GUC. 

Global performances of DPLL depend quantitatively on the splitting rule. From a qualitative point of view, however, 
the easy- hard-easy picture emerging from experiments is very robust j7|]^,|lC| . Hardest instances seem to be located 
at threshold. Solving them demand an exponentially large computational effort scaling as 2 NuJC . The values of uic 
found in literature roughly range from 0.05 to 0.1, depending on the splitting rule used by DPLL [p|,p2||. 

In this paper, we shall focus on the GUC heuristic which is simple enough to allow analytical studies and, yet, is 
already quite efficient. 



C. 2+p-SAT and instance trajectory. 



We shall present in Section III the experimental results on solving 3-SAT instances using DPLL procedure in a 
detailed way. The main scope of this paper is to compute in an analytical way the computational complexity to in 
the easy and hard regimes. To do so, we have made use of the precious notion of dynamical trajectory, that we now 
expose. 

As shown in Figure |L the action of DPLL on an instance of 3-SAT causes the reduction of 3-clauses to 2-clauses. 
Thus, a mixed 2+p-SAT distribution fl3|| , where p is the fraction of 3-clauses, may be used to model what remains 
of the input instance at a node of the search tree. A 2+p-SAT formula of parameters p, a is the logical AND of 
two uncorrelated random 2-SAT and 3-SAT instances including a (1 — p) N and apN clauses respectively. Using 
experiments |l3| and statistical mechanics calculations Jl2| |, the threshold line otcip) may be obtained with the results 
shown in Figure [| (full line). Replica calculations suggest that the sat/unsat transition taking place at ac(p) is 
continuous if p < ps and discontinuous if p > ps, where ps — 0.41 p3| ]. The tricritical point T$ is shown in Figure^. 
Below p s , the threshold ac(p) coincides with the upper bound 1/(1 — p), obtained when requiring that the 2-SAT 
subformula only be satisfied. Rigorous studies have shown that ps > 2/5, leaving open the possibility it is actually 
equal to 2/5 [|4j, or to some slightly larger value |2j|. 

The phase diagram of 2+p-SAT is the natural space in which DPLL dynamic takes place [^7|. An input 3-SAT 
instance with ratio ag shows up on the right vertical boundary of Figure |] as a point of coordinates (p = 1 , ao ) . Under 
the action of DPLL, the representative point moves aside from the 3-SAT axis and follows a trajectory. The location 
of this trajectory in the phase diagram allows a precise understanding of the search tree structure and of complexity. 



III. NUMERICAL EXPERIMENTS. 



A. Description of the numerical implementation of the DPLL algorithm. 



We have implemented DPLL with the GUC rule, see Figure |3J and Section [I B 2 , to have a fast unit propagation 
and an inexpensive backtracking Q. The program is divided in three parts. The first routine draws the clauses and 
represents the data in a convenient structure. The second, main routine updates and saves the state of the search, i.e. 
the indices and values of assigned variables, to allow an easy backtracking. Then, it checks if a solution is found; if 
not, a new variable is assigned. The third routine extracts the implication of the choice (propagation). If unit clauses 
have been generated, the corresponding literals are fixed, or a contradiction is detected. 
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1. Data Representation 



Three arrays are used to encode the data: the two first arrays are labelled by the clause number m = 1, . . . , M and 
the number b = 1, . . . , K of the components in the clause (with K = 2,3). The entries of these arrays, initially drawn 
at random, are the indices clausnum (m, b) and the values clausval (m, b), true or false, of the variables. The indices 
of the third array are integers i = 1, . . . , N and j = 1, . . . , Oj where Oi is the number of occurrences of Xi in the clauses 
(from zero to M). The entries of the matrix, a(i,j), are the numbers of the corresponding clauses (between 1 and 
M). 

2. Updating of the search state. 

If the third routine has found a contradiction, the second routine goes back along the branch, inverts the last 
assigned variable and calls again the third routine. If not, the descent along the branch is pursued. A Boolean-valued 
vector points to the assigned variables, while the values are stored in another unidimensional array. For each clause, 
we check if the variables are already assigned and, if so, if they are in agreement or not with the clauses. When 
splitting occurs, a new variable is fixed to satisfy a 2-clause, i.e. a clause with one false literal and two unknown 
variables, and the third subroutine is called. If there are only 3-clauses, a new variable is fixed to satisfy any 3-clause 
and the third subroutine is called. The variable chosen and its value are stored in a vector with index the length of 
the branch, i.e. the number of nodes it contains, to allow future backtracking. If there arc neither 2- nor 3-clauses 
left, a solution is found. 

3. Consequences of a choice and unit propagation 

All clauses containing the last fixed variable are analyzed by taking into account all possibilities: 1. the clause is 
satisfied; 2. the clause is reduced to a 2- or 1-clause; 3. the clause is violated (contradiction). In the second case, the 
1-clause is stored to be analyzed by unit-propagation once all clauses containing the variable have been reviewed. 

B. Characteristic running times 

We have implemented the DPLL search algorithm in Fortran 77; the algorithm runs on a Pentium II PC with a 
433M Hz frequency clock. The number of nodes added per minute ranges from 300,000 (typically obtained for a = 3.5) 
to 100,000 (a = 10) since unit propagation is more and more frequent as a increases. The order of magnitude of 
the computational time needed to solve an instance are listed in Table | for ratios corresponding to hard instances. 
These times limit the maximal size N of the instances we have experimentally studied and the number of samples 
over which we have averaged. Some rare instances may be much harder than the typical times indicated in Table Q. 
For instance, for a = 3.1 and N = 500, instances are usually solved in about 4 minutes but some samples required 



more than 2 hours of computation. Such a phenomenon will be discussed in Section [II D 1 



C. Overview of experiments 

1. Number of nodes of the search tree 
We have first investigated complexity by a direct count of the number of splittings, that is the number of nodes 



We nave nrst investigated complexity by a direct count oi t 
(black points) in Figure 0, for sat (Figure ||A,C) and unsat (Fi 



gure gB) trees. 
2. Histogram of branch length. 

We have also experimentally investigated the structure of search trees for unsat instances (Figure ||B). A branch 
is defined as a path joining the root (first node on the top of the search tree) to a leaf marked with a contradiction 
C (or a solution S for sat instance) in Figure |^. The length of a branch is the number of nodes it contains. For an 
unsat instance, the complete search tree depends on the variables chosen at each split, but not on the values they 
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are assigned to. Indeed, to prove that there is no solution, all nodes in the search tree have to carry two outgoing 
branches, corresponding to the two choices of the attached variables. What choice is ma de first does not matter. This 



simple remark will be of crucial importance in the theoretical analysis of Section V B 1 



We have derived the histogram of the branch lengths by counting the number B(l) of branches having length IN 
once the tree is entirely built up. The histogram is very useful to deduce the complexity in the unsat phase, since in 
a complete tree the total number of branches B is related to the number of nodes Q through the identity, 



B 



B(l) = Q + l 



(1) 



that can be inferred from Figure ||B . 



3. Highest backtracking point 

Another key property we have focused upon is the highest backtracking point in the search tree. In the unsat phase, 
DPLL backtracks all the nodes of the tree since no solution can be present. The highest backtracking point in the 
tree simply coincides with the top (root) node. In the sat phase, the situation is more involved. A solution generally 
requires some backtracking and the highest backtracking node may be defined as the closest node to the origin 
through which two branches pass, node G on Figure ||B. We experimentally keep trace of the highest backtracking 
point by measuring the numbers (72(G), C%{G) of 2- and 3-clauses, the number of not-yet-assigned variables N(G), 
and computing the coordinates pa = Cs(G) / {C2(G)+Cz(G)), qq = {C2{G) + C^{G)) / N{G) of G in the phase diagram 
of Figure [|. 



D. Experimental Results 

1. Fluctuations of complexity. 

The size of the search tree built by DPLL is a random variable, due to the (quenched) randomness of the 3-SAT 
instance and the choices made by the splitting rule ( "thermal noise" ) . We show on Figure || the distribution of the 
logarithms (in base 2, and divided by N) of the number of nodes for different values of N. The distributions are 
more and more peaked around their mean values ojjv( a ) as the size N increases. This indicates that the logarithm 
of the complexity is a self-averaging quantity in the thermodynamic limit. However, fluctuations are dramatically 
different at low and large ratios. For a = 10, and more generally in the unsat phase, the distributions are roughly 
symmetric (Figure |A). Tails are small and complexity does not fluctuate too much from sample to sample | 
the vicinity of at, e.g. a = 3.1, much bigger fluctuations are present. There are large tails on the right flanks of the 
distributions on Figure ||B, due to the presence of rare and very hard samples [^7j. Complexity is not self-averaging. 
We will come back to this point in section N% 



2. The easy-hard-easy pattern. 

We have averaged the logarithm of the number of nodes over 10,000 randomly drawn instances to obtain u>N(ct). 
The typical size Q of the search tree is simply Q — 2 NuJN . Results are shown in Figure [l]. An easy-hard-easy pattern 
of complexity appears clearly as the ratio a varies. 

• At small ratios, complexity increases as 7(a) N, that is, only linearly with the size of the instance. DPLL easily 
finds a solution and the search tree essentially reduces to a single branch shown on Figure §A. For the GUC 
heuristic, the linear regime extends up to at ~ 3.003 |18||2C|| , 

• Above threshold, complexity grows exponentially with N p8| . The logarithm ui(a), limit of WAr(a) as N — > 00, 
is maximal at criticality... 

• ... and decreases at large ratios as 1/a J2i|. The "easy" region on the right hand side of Figure [l] is still 
exponential but with small coefficients to. 



G 



Of particular interest is the intermediate region olj, < a < etc- We shall show that complexity is exponential in 
this range of ratios, and that the search tree is a mixture of the search trees taking place in the other ranges a < qj, 
and a > etc- 

Let us mention that, while this paper is devoted to typical-case (happening with probability one) complexity, 
rigorous results have been obtained that apply to any instance. So far, using a refined version of DPLL, any instance 
is guaranteed to be solved in less than 1.504^ steps, i.e. uj < 0.588. The reader is referred to reference |3(J] for this 
worst-case analysis. 



3. Lower sat phase (a < oll) 

The complexity data of Figure |l| obtained for different sizes N = 50, 75, 100 are plotted again on Figure ^ after 
division by N. Data collapse on a single curve, proving that complexity is linear in the regime a < ar,- In the vicinity 
of the cross over ratio a = a L finite-size effects became important in this region. We have thus performed extra 
simulations for larger sizes N = 500, 1000 in the range 2.5 < a < 3. that confirm the linear scaling of the complexity. 



4- Unsat Phase (a > ac) 

Results for the shape of the search trees are shown in Figure fj]. We represent the logarithm b(l), in base 2 and 
divided by N, of the number B(l) of branches as a function of the branch length I, averaged over many samples and 
for different sizes N and ratios a. When a increases at fixed N, branches are shorter and shorter and less and less 
numerous, making complexity decrease (Figure |l|). 

As N gets large at fixed a, the histogram b(l) becomes a smooth function of / and we can replace the discrete sum 
in pi) with a continuous integral on the length, 



Q + l = 



(2) 



The integral is exponentially dominated by the maximal value b ma x of b(l). u>, the limit of the logarithm of the 
complexity divided by N, is therefore equal to b max . Nicely indeed, the height b max of the histogram does not depend 
on N (within the statistical errors) and gives a straightforward and precise estimate of u>, not affected by finite-size 
effects. The values of b max as a function of a are listed in the third column of Table ||. 

The above discussion is also very useful to interpret the data on the size Q of the search trees. From the quadratic 



correction around the saddle-point, b(l) ~ b ri 



/3(l — Imax) /2, the expected finite size correction to u>n read 



UJ N 



2N 



2N 



2tt 
f3ln2 



(3) 



We have checked the validity of this equation by fitting ujn — log 2 N/(2N) as a polynomial function of 1/N. The 
constant at the origin gives values of uj in very good agreement with b max (second column in Table ||) while the 
linear term gives access to the curvature (3. We compare in Table III this curvature with the direct measurements of 
P obtained by looking at the vicinity of the top of the histogram. The agreement is fair, proving that equation (J|) is 
an accurate way of extrapolating data on Q to the infinite size limit. 



5. Upper sat phase (oll < a < ac ) 



To investigate the sat region slightly below threshold a L < a < ac, we have carried out simulations with a starting 
ratio a — 3.5. Results are shown on Figure |^A. As instances are sat with a high probability, no simple identity relates 
the number of nodes Q to the number of branches B, see search tree in Figure ||C and we measure the complexity 
through Q only. Complexity clearly scales exponentially with the size of the instance and exhibits large fluctuations 
from sample to sample. The annealed complexity (logarithm of the average complexity), uj nn , is larger than the 
typical solving hardness (average of the logarithm of the complexity), uj^ p , see Table |y[ 

To reach a better understanding of the s tructur e of the search tree, we have focused on the highest backtracking 
point G defined in Figure and Section 1IIC3. The coordinates pc,olg of point G, averaged over instances are 
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shown for increasing sizes N on Figure ^. The coordinates of G exhibit strong sample fluctuations which make the 
large N extrapolation, pa = 0.78 ± 0.01, ac = 3.02 ± 0.02 rather imprecise. 

In Section VI , we shall show how the solving complexity in the upper sat phase is related to the solving complexity 
of corresponding 2+p-SAT problems with parameters pc, etc- 



IV. BRANCH TRAJECTORIES AND THE LINEAR REGIME (LOWER SAT PHASE). 



In this section, we investigate the dynamics of DPLL in the low ratio regime, where a solution is rapidly found (in 
a linear time) and the search tree essentially reduces to a single branch shown Figure ^. We start with some general 

), useful to understand how the trajectory followed by 
). These two first sections merely expose som e prev ious 

our 



comments on the dynamics induced by DPLL (section 
the instance can be computed in the p, a plane (section 

works by Chao and Franco, and the reader is asked to consult [|18| for more details 
numerical and analytical results for the solving complexity are presented. 

In this Section, as well as in Sections M and VI, the ratio of the 3-SAT instance to be solved will be denoted by 



In the last section IVC 



ao. 



A. Remarks on the dynamics of clauses. 

1. Dynamical flows of populations of clauses. 



As pointed out in Section II C, under the action of DPLL, some clauses are eliminated while other ones are reduced. 
Let us call Cj(T) the number of clauses of length j (including j variables), once T variables have been assigned by 
the solving procedure. T will be called hereafter "time" , not to be confused with the computational time necessary 
to solve a given instance. At time T = 0, we obviously have 6*3(0) = aoN, 6*2(0) = Ci(0) = 0. As some Boolean 
variables are assigned, the time T increases and clauses of length one or two are produced. A sketchy picture of DPLL 
dynamics at some instant T is proposed in Figure |To| . 

We call ei,e2,e3 and w 2 ,wi the flows of clauses represented in Figure ^ when times increases from T to T + 1, 
that is, when one more variable is chosen by DPLL after T have already been assigned. The evolution equations for 
the three populations of clauses read, 

C 3 (T + 1) = C 3 (T) - e 3 (T) - w 2 (T) 

C 2 (T + 1) = C 2 {T) - e 2 (T) + w 2 (T) - Wl (T) 

C 1 (T + l) = C 1 (T)-e 1 (T)+w 1 (T) . (4) 

The flows ej and uij are of course random variables that depend on the instance under consideration at time T, and 
on the choice of the variable (label and value) done by DPLL. For a single descent, i.e. in the absence of backtracking, 
the evolution process f£J) is Markovian and unbiased. The distribution of instances generated by DPLL at time T is 
uniform over the set of all the instances having Cj{T) clauses of length j = 1,2,3 and drawn from a set of N — T 
variables Eql. 



2. Concentration of distributions of populations. 

As a result of the additivity of (Q), some concentration phenomenon takes place in the large size limit. The numbers 
of clauses of lengths 2 and 3, a priori extensive in N , do not fluctuate too much, 

CAT) = N c 3 (Jpj + o(N) (j =2,3). (5) 

where the Cj's are the densities of clauses of length j averaged over the instance (quenched disorder) and the choices of 
variables ("thermal" disorder). In other words, the populations of 2- and 3-clauses are self-averaging quantities and we 
shall attempt at calculating their mean densities only. Note that, in order to prevent the occurrence of contradictions, 
the number of unitary clauses must remain small and the density c\ of 1-clauses has to vanish. 
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3. Time scales separation and deterministic vs. stochastic evolutions. 

Formula ([&]) also illustrates another essential feature of the dynamics of clause populations. Two time scales are at 
play. The short time scale, of the order of the unity corresponds to the fast variations of the numbers of clauses Cj (T) 
(j = 1, 2, 3). When time increases from T to T + 0(1) (with respect to the size N), all Cj's vary by O(l) amounts. 
Consequently, the densities of clauses Cj, that is, their numbers divide by N, are not modified. The densities CjS 
evolve on a long time scale of the order of N and depend on the reduced time t = T/N only. 

Due to the concentration phenomenon underlined above, the densities Cj(t) will evolve deterministically with the 
reduced time t. We shall see below how Chao and Franco calculated their values. On the short time scale, the relative 
numbers of clauses Dj(T) = Cj(T) — Ncj(T/N) fluctuate (with amplitude -C N) and are stochastic variables. As 
said above the evolution process for these relative numbers of clauses is Markovian and the probability rates (master 
equation) are functions of slow variables only, i.e. of the reduced time t and of the densities C2 and c 3 only. As a 
consequence, on intermediary time scales, much larger than unity and much smaller than N, the DjS may reach some 
stationary distribution that depend upon the slow variables. 

This situation is best exemplified in the case j = 1 where c\(t) = as long as no contradiction occurs and 
D 1 (T) = Ci(T). Consider for instance a time delay 1 < AT < N, e.g. AT = \fN. For times T lying in between 
Tq = t N and T\ = Tq + AT = tN + VN, the numbers of 2- and 3-clauses fluctuate but their densities are left 
unchanged and equal to ci (t) and C3 (t) . The average number of 1-clauses fluctuates and follows some master equation 
whose transition rates (from C[ — C\(T) to C\ — C\(T + 1)) define a matrix Ai(Ci,C[) and depend on t, C2,c 3 
only. M. has generally a single eigenvector p{C\) with eigenvalue unity, called equilibrium distribution, and other 
eigenvectors with smaller eigenvalues (in modulus). Therefore, at time T±, C\ has forgotten the "initial condition" 
Ci(Xb) and is distributed according to the equilibrium distribution /i(Ci) of the master equation. 

To sum up, the dynamical evolution of the clause populations may be seen as a slow and deterministic evolution 
of the clause densities to which are superimposed fast, small fluctuations. The equilibrium distribution of the latter 
adiabatically follows the slow trajectory. 



B. Mathematical analysis. 



In this section, we expose Chao and Franco's calculation of the densities of 2- and 3-clauses. 



1. Differential equations for the densities of clauses. 

Consider first the evolution equation (^|) for the number of 3-clauses. This can be rewritten in terms of the average 
density C3 of 3-clauses and of the reduced time t, 

^ = -*(*) , (6) 



where 2:3 = (63 + 102) denotes the averaged total outflow of 3-clauses (Section IV A 2). 

At some time step T — > T + 1 , 3-clauses are eliminated or reduced if and only if they contain the variable chosen by 
DPLL. Let us first suppose that the variable is chosen in some 1- or 2-clauses. A 3-clause will include this variable or 
its negation with probability 3/(A^ — T) and disappear with the same probability. Due to the uncorrelation of clauses, 
we obtain z^(t) — Zc^(t)/{1 —t). If the literal assigned by DPLL is chosen among some 3-clause, this result has to be 
increased by one (since this clause will necessarily be eliminated) in the large N limit. 

Let us call Pjit) the probability that a literal is chosen by DPLL in a clause of length j (=1,2,3). The normalization 
of probability imposes that 

Pi(t) + P2 (t) + P3 (t) = 1 , (7) 
at any time t. Extending the above discussion to 2-clauses, we obtain 

dc 3 (t) _ 3 



dt l-t 



c 3 (t)-p 3 (t) 



^^ = -r^Cs(*)-— ca(t)-pa(i) , (8) 
dt 2(1 -t) W 1 -t W H W W 







In order to solve the above set of coupled differential equations, we need to know the probabilities pj. As we shall 
see, the values of the pj can d irectly be deduced from the heuristic of choice, the so-called generalized unit-clause 
(GUC) rule exposed in section II B 2| . 



The solutions of the differential equations (|8|) will be expressed in terms of the fraction p of 3-clauses and the ratio 
a of clauses per variable using the identities 

V ' c 2 (t) + c 3 (t) W 1-t K ' 



2. Solution for ao < 2/3. 

When DPLL is launched, 2-clauses are created with an initial flow (u^O)) = 3 ao/2. Let us suppose that ao < 2/3, 
i.e. W2(0) < 1. In other words, less than one 2-clause is created each time a variable is assigned. Since the GUC 
rule compels DPLL to look for literals in the smallest available clauses, 2-clauses are immediately removed just after 
creation and do not accumulate in their recipient. Unitary clauses are almost absent and we have 

Pi(t) = Q; /°2(i) = J^%; p 3 (t) = l- P2 (t) («o<2/3). (10) 

The solutions of (|^) with the initial condition p(0) = 1, a(0) = ao read 

P(t) = 1 , 

a(t) = (a + 2)(l-<) 3/2 -2(l-i) . (11) 

Solution ( |TT| ) confirms that the instance never contains an extensive number of 2-clauses. At some final time t en d, 
depending on the initial ratio, a(t enc i) vanishes: no clause is left and a solution is found. 



3. Solution for 2/3 < «o < ol- 

We now assume that ao > 2/3, i.e. (w2(0)} > 1- In other words, more than one 2-clause is created each time a 
variable is assigned. 2-clauses now accumulate, and give rise to unitary clauses. Due to the GUC prescription, in 
presence of 1- or 2-clauses, a literal is never chosen in a 3-clause. Thus, 

Pi(t) = Yz{; P2(*) = l-Pi(t) ; P3 (t)=0 (a >2/3), (12) 
as soon as t > 0. The solutions of (0) now read 



Pit) 



a (l-f) 2 + 3ao + 41n(l-i) 



Solution (jl|) requires that the instance contains an extensive number of 2-clauses. This is true at small times since 
p'(0) = 1/ctQ — 3/2 < 0. At some time t* > 0, depending on the initial ratio, p(t*) reaches back unity: no 2-clause are 
left and hypothesis (jlj) breaks down. DPLL has therefore reduced the initial formula to a smaller 3-SAT instance 
with a ratio a* = a(t*). It can be shown that a* < 2/3. Thus, as the dynamical process is Markovian, the further 



evolution of the instance can be calculated from Section IV B 2 



4- Trajectories in the p, a plane. 



We show in Figure [|the trajectories ob tained for initial ratios ao — 0.6, ao — 2 and ao = 2.8. When ao > 2/3, the 
trajectory first heads to the left (Section IV B 3) and then reverses to the right until reaching a point on the 3-SAT 
axis at small ratio a*(< 2/3) without ever leaving the sat region. Further action of DPLL leads to a rapid elimination 
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of the remaining clauses and the trajectory ends up at the right lower corner S, where a solution is achieved (section 



IVB2) 



As ao increases up to a/,, the trajectory gets closer and closer to the threshold line ac(p). Finally, at ah — 3.003, 
the trajectory touches the threshold curve tangentially at point T with coordinates (pr = 2/5, oct = 5/3). Note the 
identity ar = 1/(1 — pr)- 

C. Complexity. 

In this section, we compute the computational complexity in the range < ao < oti, from the previous results. 

1. Absence of backtracking. 



The trajectories obtained in section [1VB4| represent the deterministic evolution of the densities of 2- and 3-clauses 
when more and more variables are assigned. Equilibrium fluctuations of number of 1-clauses have been computed by 
Frieze and Suen |2C|j . The stationary distribution /it(Ci) of the population of 1-clauses can be exactly computed at 
any time t. The most important result is the probability that C\ vanishes, 

H t (0) = l-a(t)(l-p(t)) . (14) 

/it(0) (respectively 1 — Att(0)) may be interpreted as the probability that a variable assigned by DPLL at time t is 
chosen through splitting (resp. unit-propagation). When DPLL starts solving a 3-SAT instance, fit=o(0) = 1 and 
many splits are necessary. If the initial ratio ao is smaller than 2/3, this statement remains true till the final time 
t end and the absence of 1-clauses prevents the onset of contradiction. Conversely, if 2/3 < a < a £ , as t grows, /x t (0) 
decreases are more and more variables are fixed through unit-propagation. The population of 1-clauses remains finite 
and the probability that a contradiction occurs when a new variable is assigned is 0(1/ N) only. However small is 
this probability, O(N) variables are fixed along a complete trajectory. The resulting probability that a contradiction 
never occurs is strictly smaller than unity |p0|| , 

/ 1 /■*•»- dt (1- Mt (0)) 2 \ Mc . 

'No Contradiction = exp — — / — . (15) 

V 4 7 1-t ptf (0) J 

Frieze and Suen have shown that contradictions have no dramatic consequences. The number of backtrackings 
necessary to find a solution is boun ded fro m above by a power of logiV. The final trajectory in the p, a plane is 



identical to the one shown in section IV B 4 and the increase of complexity is negligible with respect to 0(N). 

When ao reaches ah, the trajectory intersects the a = 1/(1 — p) line in T. At this point, /i(0) vanishes and 
backtracking enters massively into play, signaling the cross-over to exponential regime. 

2. Length of trajectories. 

From the above discussion, it appears that a solution is found by DPLL essentially at the end of a single descent 
(Figure |^A). Complexity thus scales linearly with N with a proportionality coeffi cient 7 (ap) smaller than unity. 
For «o < 2/3, clauses of length unity are never created by DPLL (Section \ V B 2 ) . Thus, DPLL assigns the 



overwhelming majority of variables by splittings. 7(^0) simply equals the total fraction t en d of variables chosen by 
DPLL. From (|ll|), we obtain 

7(«o) - 1 - (qq I 2)2 K < 2/3) . (16) 



For larger ratios, i.e. ao > 2/3, the trajectory must be decomposed into two successive portions (Section IV B | ) . 
During the first portion, for times < t < t* , 2-clauses are present with a non vanishing density 02(f) . Some of these 
2-clauses arc reduced to 1-clauses that have to be eliminated next. Consequently, when DPLL assigns an infinitesimal 
fraction dt of variables, a fraction p\(t) = a(t)(l — p(t))dt are fixed by unit-propagation only. The number of nodes 
(divided by N) along the first part of the branch thus reads, 

11 =t*- [ dt a(t)(l - p(t)) . (17) 
Jo 
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At time t*, the trajectory touches back the 3-SAT axis p = 1 at ratio a* = a(t*) < 2/3. The initial instance is 
then reduces to a smaller and smaller 3-SAT formula, with a ratio a{t) vanishing at t enc i- According to the above 
discussion, the length of this second part of the trajectory equals 



72 = tend - t* 



(18) 



It results convenient to plot the total complexity 7 = 71 + 72 in a parametric way. To do so, we express the initial 
ratio ao and the complexity 7 in terms of the end time t* of the first part of the branch. A simple calculation from 
equations (13) leads to 



ao(t*) = - 



41n(l -t*) 
3i*(2 - t*) 

4(1 - t*) 



1 



7(**) = 1 - 7 ; ^ ^ ' , »o + t* + (1 - t*) ln(l - t*) - -a a (t*) (i*) 2 (3 - 

(2 + (1 -t*) 2 a {t*)) 2 1 ' V ; 4 01 ;v ' V 



(19) 



As t* grows from zero to t* L ~ 0.892, the initial ratio ao spans the range [2/3; a^]. The complexity coefficient 7(^0) 
can be computed from equations ( |lq , |l9| ) with the results shown Figure pi The agreement with the numerical data of 
Section HID 3 is excellent. 



V. TREE TRAJECTORIES AND THE EXPONENTIAL REGIME (UNSAT PHASE). 



To present our analytical study of the exponentially large search trees generated when solving hard instances, we 
consider first a simplified growth tree dynamics in which variables, on each branch, are chosen independently of the 
1- or 2-clauses and all branches split at each depth T. This toy model is too simple a growth process to reproduce 
a search tree analogous to the ones generated by DPLL on unsat instances. In particular, it lacks two essential 
ingredients of the DPLL procedure: the generalized unit clause rule (literals are chosen from the shortest clauses), 
and the possible emergence of contradictions halting a branch growth. Yet, the study on the toy model allows us to 
expose and test the main analytical ideas, before turning to the full analysis of DPLL in Section VB. 



A. Analytical approach for exponentially large search trees: a toy case. 

1. The toy growth dynamics. 

In the toy model of search tree considered hereafter, only 3-clauses matter. Initially, the search tree is empty and 
the 3-SAT instance is a collection of C 3 3-clauses drawn from N variables. Next, a variable is randomly picked up 
and fixed to true or false, leading to the creation of one node and two outgoing branches carrying formulae with C 3 
and C 3 3-clauses respectively, see Figure [H| This elementary process is then repeated for each branch. At depth or 
"time" T, that is when T variables have been assigned along each branch, there are 2 T branches in the tree. 

The quantity we focus on is the number of branches having a given number of 3-clauses at depth T. On each 
branch, after a variable has been assigned, C3 decreases (by clause reduction and elimination) or remains unchanged 
(when the chosen variable does not appear in the clauses). So, the recipients of 3-clauses attached to each branch, 
see Figure [l0| leak out with time T. 

We now assume that each branch in the tree evolves independently of the other ones and obeys a Markovian 
stochastic process. This amounts to neglect correlations between the branches, which could arise from the selection 
of the same variable on different branches at different times. As a consequence of this decorrelation approximation, 
the value of C3 at time T + 1 is a random variable whose distribution depends only on C 3 (and time T) . 

The decorrelation approximation allows to write a master equation for the average number B(C%\T) of branches 
carrying C3 3-clauses at depth T in the tree, 

00 

B(C 3 ;T+1)= K(C 3 ,C' 3 ;T)B(C' 3 ;T) . (20) 
c' 3 =o 

K is the branching matrix; the entry K(C 3 ,C 3 ) is the averaged number of branches with C 3 clauses issued from 
a branch carrying C 3 clauses after one variable is assigned. This number lies between zero (if C 3 > C' 3 ) and two 
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(the maximum number of branches produced by a split), and is easily deduced from the evolution of the recipient of 
3-clauses in Figure |l0| 



(Jl \ / O \ C3-C3 / o \ 



K(C 3 ,C>;T)=2 X (C' 3 -C 3 ){ C ^ C3 ) ^_ j ^1 - _ j (21) 

The factor 2 comes from the two branches created at each split; x(C 3 — O3) equals unity if C 3 — C 3 > 0, and zero 
otherwise. The binomial distribution in (|l]) comes from the probability that the variable fixed at time T appears 
exactly in C 3 — C3 3-clauses. 



2. Partial differential equation for the distribution of branches. 



For large instances N, M — > 00, this binomial distribution simplifies to a Poisson distribution, with parameter 
m 3 (T) = 3 C' 3 /(N - T). The branching matrix (M) thus reads, 



\C'-C 3 



K (C 3 , C 3 ; T) ~ K (C 3 - C 3 , m 3 (T)) = 2 X (C< - C 3 ) e~"^ T > , 

(C 3 - C 3 j! 



(22) 



Consider now the variations of the entries of K ([22] ) over a time interval To = t N < T < T± — (t + e) N. Here, e is a 
small parameter but of the order of one with respect to N. In other words, we shall first send N to infinity, and then 
e to zero. m 3 (T) weakly varies between To and T%: m 3 (t) — 3 c 3 /(l — t) + 0(e) where c 3 = C' 3 /N is the intensive 
variable for the number of 3-clauses. The branching matrix (|2^) can thus be rewritten, for all times T ranging from 
T) to Ti, as 



K(C 3 - C 3 ,m 3 {t)) =2 X (C 3 - C 3 ) e~ m ^ 
We may now iterate eqn.( 



m 3 (t) 



c'-c 3 



O(e) 



(C3-C3)! 

over the whole time interval of length T = T± — Tq = e N, 



(23) 



B(C 3 ;tN + T)= £] [K T ] (C 3 - C 3 ;t) B(C 3 ;t N) 



(24) 



C'=0 



where K denotes the T power of K. As K depends only on the difference C' 3 — C 3 , it can be diagonalized by 
plane waves v(q 3 , C 3 ) = e lq3 c ' 3 /v / 27r with wave numbers < (73 < 2 it (because C 3 is an integer- valued number). The 
corresponding eigenvalues read 



A 93 (t) = 2 exp 



m 3 (t) (e 



1) 



Reexpressing matrix (B3T) using its eigenvectors and eigenvalues, equation (j24|) reads 



B{C 3 -tN + T)=Y J / ^{X q3 (t)) e i{C3 - C ' 3)q B(C 3 ;tN) 



(25) 



(26) 



Branches duplicates at each time step and proliferate exponentially with T — tN. A sensible scaling behavior for 
their number is, for any fixed fraction c 3 , 



lim ^-\nB{c 3 N-tN) = u){c 3 ,t) 

N— >oo iv 



(27) 



Note that w has to be divided by In 2 to obtain the logarithm of the number of branches in base 2 to match the 
definition of Section HID 1. Similarly, we introduce the rescaled variable r 3 , replacing C' 3 in the sum of (]24|), 



C' 3 -C 3 = r 3 T 



(28) 



Equation (|2S]) simply means that r 3 is of the order of unity when T gets very large, since the number of 3-clauses 
that disappear after having fixed T variables is typically of the order of T. We finally obtain from eqn.(p6|), 
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exp (n w(c3,i+ e)\ = ^ ^ dr 3 ^ ^ exp 



iV (e lnA 93 (i) - ier 3 q 3 + w(c 3 + er 3 ,i)) 



In the N — > oo limit, the integrals in (E9j) may be evaluated using the saddle point method, 



w(c 3 , i + e) = max 



e In A g3 {t) -ie r 3 q 3 + lo(c 3 + e r 3 , t) 



0(e 2 



(29) 



(30) 



due to the terms neglected in fl23|). Assuming that cj(c 3 , i) is a partially differentiable functions of its arguments, and 
expanding ([so]) at the first order in e, we obtain the partial differential equation, 



duj . . ( . . <9w . . 

— (c 3 ,i) = max lnA 93 (i) - ir 3 <j 3 + r 3 — (c 3 ,i) 

OT rs,«3 V OC 3 



The saddle point lies in q 3 



j du 

' 9c 3 



leading to, 



, \ , _ 3 C3 3 C3 
_( C3! t)=ln2- 3—^ + 3—^ exp 



9w 
9c 3 



(C3,t) 



(31) 



(32) 



It is particularly interesting to note that a partial differential equation emerges in (p2|). In contradistinction with the 
evolution of a single branch described by a set of ordinary differential equations (Section IV), the analysis of a full tree 
requires to handle information about the whole distribution w of branches, and so to a partial differential equation. 



3. Legendre transform and solution of the partial differential equation. 
To solve this equation is convenient to define the Legendre transformation of the function w(c3, t), 



max 

C3 



w(c 3 ,t) + y 3 c 3 



(33) 



From a statistical physics point of view this is equivalent to pass, at fixed time t, from a microcanonical 'entropy' ui 
defined as a function of the 'internal energy' C3, to a 'free energy' ip defined as a function of the 'temperature' 2/3. 
More precisely, y(j/3, t) is the logarithm divided by N of the generating function of the number of nodes B(C 3 ; t N). 
Equation (p3|) defines the Legendre relations between C3 and y, 



did 
dc 3 



"2/3 



and 



£3(2/3) 



dip 
dy 3 



C3 



In terms of ip and y 3 , the partial differential equation (p2|) reads 



dt yyi ' ' 1 



2rt(c3) 



dp 



dy 3 



(34) 



(35) 



and is linear in the partial derivatives. This is a consequence of the Poissonian nature of the distribution entering K 
(p2|). The initial condition for the function ip(y 3 ,t) is smoother than for u(c 3 ,t). At time t = 0, the search tree is 
empty: to(c 3 , t = 0) equals zero for c 3 = ap, and —00 for c 3 7^ otp. The Legendre transform is thus p(y 3 , 0) = ao y 3 , a 
linear function of y 3 . The solution of eqn. (pq) reads 



ip{y 3 ,t)=t ln2 + a ln [l - (1 - e y3 ) (1 - tf 
and, through Legendre inversion, 

lu(c 3 , t) = t In 2 + 3c3 ln(l — t) + ao lnao — c 3 lnc 3 — (ao — c 3 ) In 



ag - c 3 
1 - (1 - *) S 



(36) 



(37) 



for < c 3 < a , and lu = —00 outside this range. We show in Figure [T^ the behavior of lu(c 3i t) for increasing times t. 
The curve has a smooth and bell-like shape, with a maximum located in c 3 (t) = ao(l — t) 3 , cu(t) = t ln2. The number 
of branches at depth t equals B{t) = e Nu ^' — 2 Nt , and almost all branches carry c 3 (t) N 3-clauses, in agreement 
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with the expression for Cs(t) found for the simple branch trajectories in the case 2/3 < a < 3.003 (Section [TV]!! ). 
The top of the curve u>{c 3 , t), at fixed t, provides direct information about the dominant, most numerous branches. 

For the real DPLL dynamics studied in next section, the partial differential equation for the growth process is much 
more complicated, and exact analytical solutions are not available. If we focus on exponentially dominant branches, 
we may as a first approximation follow the dynamical evolution of the points of the curve w(c3, t) around the top. To 
do so, we linearize the partial differential equation (35) for the Legendre transform (f> around the origin 2/3 = (|34|), 



dip 
~dt 



(y 3 ,i)~ln2 



(38) 



The solution of the linearized equation, ip(y3,t) — t In 2 + ao(l — t) 3 2/3, is itself a linear function of 2/3. Through 
Legendre inversion, the slope gives us the coordinate c^it) of the maximum of lo, and the constant term, the height 
ui{t) of the top. 



B. Analysis of the full DPLL dynamics in the unsat phase. 

1. Parallel versus sequential growth of the search tree. 

A generic refutation search tree in the unsat phase is shown in Figure ||B. It is the output of a sequential building 
process: nodes and edges are added by DPLL through successive descents and backtrackings. We have imagined a 
different building up, that results in the same complete tree but can be mathematically analyzed: the tree grows in 
parallel, layer after layer. A new layer is added by assigning, according to DPLL heuristic, one more variable along 
each living branch. As a result, some branches split, others keep growing and the remaining ones carry contradictions 
and die out. 



2. Branching matrix for DPLL. 



To take into account the operation of the DPLL procedure, see Section II B, we follow, in each branch, the number 
of 3-clauses C3 as well as the numbers C 2 and C\ of 2- and 1-clauses. The evolution equation for the average number 
of branches B carrying instances with Cj j-clauses (J = 1, 2, 3) reads, from (pp[), 



B(C U C 2: C 3 ;T+1) = V K (C^C^C^C'^C^C'^T) B(Ci,C^C 3 ;T) , (39) 



where the branching matrix K now equals 



a,-c 3 j \N -T J V N-TJ ^ yii \ w 2 
( J -^1) E { C S] {j^t) { l m^t) £ (I) (l) S c 2 -c>- W2+Z2 S Cl - Ci - Wl+ i + 5c> x 



.z 2 \N-T V N-TJ ^ n V2/ \wi 

Z2—0 Wi—0 x / x 



Ef^Ofwrrr i*-»yf""' t (lT(:)*c,-c;- m+a »llc^ + Sc^]\ , (40) 



z 2 =0 \ z / \ / V / Wl =Q v 7 v X/ 

where 6c denotes the Kronecker delta function: 5c = 1 if C = 0, 8c = otherwise. 

To understand formula (f4(i|), the picture of recipients in Figure |l^ proves to be useful. K expresses the average 
flow of clauses into the sink (elimination), or to the recipient to the right (reduction), when passing from depth T to 
depth T + 1 by fixing one variable. Among the C' 3 — C3 clauses that flow out from the leftmost 3-clauses recipient, 
w 2 clauses are reduced and go into the 2-clauses container, while the remaining C3 — C3 — w 2 are eliminated. w 2 is 
a random variable in the range < w 2 < C3 — C3 and drawn from a binomial distribution of parameter 1/2, which 
represents the probability that the chosen literal is the negation of the one in the clause. 

We have assumed that the algorithm never chooses the variable among 3-clauses. This hypothesis is justified a 
posteriori because in the unsat region, there is always (except at the initial time t = 0) an extensive number of 
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2-clauses. Variable are chosen among 1-clauses, or if none is present, among 2-clauses. The term on the r.h.s. of 
eqn. (Ej) beginning with 8c t (respectively 1 — &Ci) corresponds to the latter (resp. former) case, z 2 is the number 
of clauses (other than the one from which the variable is chosen) flowing out from the second recipient; it obeys 
a binomial distribution with parameter 2/(N — T), equal to the probability that the chosen variable appears in a 
2-clause. The 2-clauses container is, at the same time, poured with u>2 clauses. In an analogous way, the unitary 
clauses recipient welcomes w\ new clauses if it was empty at the previous step. If not, a 1-clause is eliminated by 
fixing the corresponding literal. 

The branch keeps growing as long as the level C\ of the unit clauses recipient remains low, i.e. C\ remains of the 
order of unity so that the probability to have two, or more, 1-clauses with opposite literals can be neglected. For this 
reason, we do not take into account the (extremely rare) event of 1-clauses including equal or opposite literals and z\ 
is always equal to zero. We shall consider later a halt criterion for the tree growth process, see dot-dashed line in the 
phase diagram of Figure^, where the condition C\ = 0(1) breaks down due to an avalanche of 1-clauses. 

Finally, we sum over all possible flow values u> 2 ,z 2 ,wi that satisfy the conservation laws C 2 — C 2 = w 2 — z 2l 
C\ — C[ = Wx — 1 when C[ ^ or, when C[ = 0, C2 — Cg = W2 — 22 — 1 ; Ci = w\ if the literal is the same as the one 
in the clause or C\ = w\ + 1 if the literal is the negation of the one in the clause. The presence of two 5 is responsible 
for the growth of the number of branches. In the real sequential DPLL dynamics, the inversi on of a literal at a node 



requires backtracking; here, the two edges grow in parallel at each node according to Section VB 1 



In the large N limit, the matrix (40) can be written in terms of Poisson distributions through the introduction of 



the parameters m 3 = 3 C 3 /(N - T) and m 2 — 2C 2 /(N — T), see eqn. (f22| 

3. Ground state of the branching matrix and localization properties. 
Due to the translational invariances of K in C 3 — C3 and C 2 — C2 , the vectors 

v q , q2 , q3 (C[,C^C 3 ) =e i ^+^v q (C 1 ) (41) 
are eigenvectors of the matrix K with eigenvalues 

f \e iqi 
V*= ex P ™3 — (1 + e 



-IIJ2 1 



A 9l92 , (42) 



if and only if v q {Ci) is an eigenvector, with eigenvalue \ q ,q 2 , of the reduced matrix 

z 2 =0 Z2 ' u>i=0 V 1/ l J 



(43) 



Note that, while 92:93 are wave numbers, q is a formal index used to label eigenvectors. The matrix K q2 ( f43| ) has 
been obtained by applying K onto the vector v (^), and summing over C3, C 2 ,W2 and Z3. 

The diagonalization of the non hermitian matrix K q2 is exposed in Appendix |A|, and relies upon the introduction 
of the generating functions of the eigenvectors v q , 

00 

V q (x) = J2 v q (Cl) ■ (44) 

Ci=0 

The eigenvalue equation for K q2 translates into a self-consistent equation for V q {x), the singularities of which can be 
analyzed in the x plane, and permit to calculate the largest eigenvalue^] of K q2 , 



A„ 2 = exp <^ -m 2 + m 2 — — 1 + V 1 + 4 e~ 1 ^ \ 

q -1 + Vl + 4e-^ I 4 V J) 



(45) 



2 We show in next Section that 52 is purely imaginary at the saddle-point, and therefore the eigenvalue in eqn.(Ua) is real- valued. 



1G 



The properties of the corresponding, maximal eigenvector v$ are important. Depending on parameters 92 and m 2 , 
vo(Ci) is either localized around small integer values of C\ (the average number of 1-clauses is of the order of the 
unity), or extended (the average value of C\ is of the order of N). As contradictions inevitably arise when C\ = O(N), 
the derealization transition undergone by the maximal eigenvector provides a halt criterion for the growth of the 
tree. 



4- Partial differential equation for the search tree growth. 



Following Section [V A 2| and eqn. ([26|) , we write the evolution of the number of branches between times t N and 
tN + T using the spectral decomposition of K, 

B(d,C 2 ,C 3 ;tN + T)= J2 E/ 27r ^^e^(^) +32 (c 2 -^)]- g(C(i) -+ (cD x 

(\ m3 ) T B(C[,C' 2 ,C 3 ,tN) . (46) 

J2 q denotes the (discrete or continuous) sum on all eigenvectors and u+ the left eigenvector of K. We make the 
adiabatic hypothesis that the probability to have C\ unit clauses at time tN + T becomes stationary on the time 
scale 1 <C T <C N and is independent of the number C[ of 1-clauses at time t N (Section [V). As T gets large, and 
at fixed (72,93, the sum over q is more and more dominated by the largest eigenvalue q — 0, due to the gap between 
the first eigenvalue (associated to a localized eigenvector) and the continuous spectrum of delocalized eigenvectors 
(Appendix [A]). Let us call A(q 2l 93) = Ao, 92 , g3 this largest eigenvalue, obtained from equations ( ff5| ) and (p2[). Defining 
the average of the number of branches over the equilibrium distribution of 1-clauses, 



B(C 2l C 3 ;T) 



(47) 



equation (E6h leads to 



B(C 2l C 3] tN + T)= £ /" 2W ^^ e %3(c 3 -^) +g3 (c 2 -^)] (A(52)e3)) r B(a c ,. i 

' In Z7T ZlT 

n' n' — n J u 



N) 



(48) 



The calculation now follows closely the lines of Section V A 2. We call uj(c2, c 3 ,t) — limjy^oo In B(c 2 N, C3 A; t N) /N, 
the logarithm of the number of branches carrying an instance with C2 N 2-clauses and C3 N 3-clauses at depth t N. 
Similarly, we rewrite the sums on C^C^ on the r.h.s. of cqn.([48|) as integrals over the reduced variable r 2 = 
(C 2 — C2VT, r 3 = (C3 — Cs)/T, see equations ( p6| ) and (|2g|). A saddle-point calculation of the four integrals over 
92, 93, ?*2, ^3 can be carried out, resulting in a partial differential equation for lo{c 2i C3,t), 



duj 
~dt 



(c 2 ,c 3 ,t) = In A 



. du .duj 
dc 2 ' dc3 



(49) 



or, equivalcntly, 



duj 
~dt 



du> 
dc 2 



hi 



1 + V 4 e a <=2 + 1 



3c 3 
1 - 1 



(1 + e a =2 ) - 1 



2c 2 

1 - 1 



5. Approximate solution of the partial differential equation 



4e" 



1-1 



(50) 



As in Section VA3, we introduce the Legendre transformation of u>(c 2 ,C3,t), 



92(2/2,2/3,*) = max w(c 2 ,c 3 ,t) +y 2 c 2 +y 3 c 3 

C2,C 3 



(51) 
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The resulting partial differential equation on ip is given in Appendix [B|, and cannot be solved analytically. We therefore 
limit ourselves to the neighborhood of the top of the surface w(c2,C3,t) through a linearization around yi = j/3 = 0, 



dip 
~dt 



In 



10 



2/2 



1 - 1 



m 



2/2 \ dip 



2 ) dy 3 




(52) 



The solution of eqn ( |52] ) is given by the combination of a particular solution with = and the solution of the 
homogeneous counterpart of equation (|52]) , We write below the general solution for any 2+p-SAT unsat problem 
with parameters po,ao > a>c(po), the 3-SAT being recovered when po = 1. The initial condition at time t = reads 
^(2/2, 2/3,0) = a po 2/3 + ao (1 - Po) 2/2- We obtain 



V 3 (2/2, 2/3, t) =c 2 (t)2/2 + c 3 (*)2/3 + d;(t) 



(53) 



with 

c 3 (*) 
w(t) 



5 + 3%/5 




(1-i) 3 ] + (a (l-Po) + 2 + V5) (l-t)T _(2 + V5)(l-t) , 



f In 



1 + Vs\ l + \/5 N 



a (l-po) + 2 + V5 ((1-t) 



5+3S5 



(54) 



Within the linearized approximation, the distribution u)(c%, c 3 , t) has its maximum located in C2(t), c 3 (t) with a height 
<2>(t), and is equal to minus infinity elsewhere. The coordinates of the maximum as functions of the depth t defines 
the tree trajectory, i.e. the evolution, driven by the action of DPLL, of the dominant branches in the phase diagram 
of Figure |]. We obtain straightforwardly this trajectory from equations (|54|) and the transformation rules (||). 



6. Interpretation of the tree trajectories and results for the complexity. 



In Figure [|, the tree trajectories corresponding to solving 3-SAT instances with ratios ao = 4.3, 7 and 10 are shown. 
The trajectories start on the right vertical axis p = 1 and head to the left until they hit the halt line a ~ 1.259/(1 — p) 
(dot-dashed curve) at some time th > 0, which depends on ao. On the halt line, a derealization transition for the 
largest eigenvector takes place (for parameters 2/2=2/3 = 0, sec Appendix [a| and Figure 16) and causes an avalanche 
of unitary clauses with the emergence of contradictions, preventing branches from further growing. 

The derealization transition taking place on the halt line means that the stationary probability /i((0) of having no 
unit-clause 



MO) 



5b(0) 



Vo(0) 
Vo(l) 



(55) 



vanishes at t = th- From equations (J42L^5U5q, Al A3), the largest eigenvalue for dominant branches, A(0,0) reaches 
its lowest value, one, on the halt line. As expected, the emergence of contradictions on dominant branches coincides 
with the halt of the tree growth, see equation (fl9"|). 

The logarithm u)(t) of the number of dominant branches increases along the tree trajectory, from zero at t = up to 
some value ) > on the ha lt line. This final value, divided by In 2, is our analytical prediction (within the linear 
approximation of Section |V B 5j ) for the complexity u>. We describe in Appendix^, a refined, quadratic expression for 
the Legendre transform <p (|51|) that provides another estimate of uj. 

The theoretical values of u>, within linear and quadratic approximations, are shown in Table |l| for ao = 
20,15,10,7,4.3, an d com pare very well with numerical results. Our calculation, w hich is fact an annealed esti- 
mate of uj (Section [II D 1), is very accurate. The decorrelation approximation (Section VA) becomes more and more 



precise with larger and larger ratios ao. Indeed, the probability that the same variable appears twice in the search 
tree decreases for smaller trees. For large values of ao, we obtain 



18 



. (3 + V5) [ln(^)] 2 o. 292 
" (ao) 61^ aV (56) 

The 1/cko scaling of w has been previously proven by Beam, Karp, and Pitassi pg[| , independently of the particular 
heuristics used. Showing that there is no solution for random 3-SAT instances with large ratios ao is relatively easy, 
since assumptions on few variables generate a large number of logical consequences, and contradictions emerge quickly, 
see Figure 0. This result can be inferred from Figure 0. As ao increases, the distance between the vertical 3-SAT axis 
and the halt line decreases; consequently, the trajectory become shorter, and so does the size of the search tree. 

7. Length of the dominant branches. 

To end with, we calculate the length of the dominant branches. The probability that a splitting occurs at time t is 
//j(0) defined in fl55|). Let us define n(L,T) as the number of branches having L nodes at depth T in the tree. The 
evolution equation for n is n(L, T + 1) = (1 — /it(0)) n(L, T) + 2 /it(0) n(L—l,T). The average branch length at time T, 
(L(T)) = J2T=i L n ( L > T ) /El=o n ( L > T )' obe y s the sim P le evolution relation (L(T+1)) - {L{T)) = 2 // t (0)/(l+Mt(0)). 
Therefore, the average number of nodes (divided by N), I — (L(tN))/N, present along dominant branches once the 
tree is complete, is equal to 

(i) = I" ft (57) 

J i + Mo) 

where is the halt time. For large ratios ao, the average length of dominant branches scales as 

( 3 + V5)ln(^)+l-V5 . 428 
WK) ^ L ■ (58) 

The good agreement between this prediction and the numerical results can be checked on the insets of Figure [7] for 
different values of ao- 

VI. MIXED TRAJECTORIES AND THE INTERMEDIATE EXPONENTIAL REGIME (UPPER SAT 

PHASE). 

In this section, we show how the complexity of solving 3-SAT instances with ratios in the intermediate range 
&l < &o < cue can be understood by combining the previous results on branch and tree trajectories. 

A. Branch trajectories and the critical line of 2+p-SAT. 

In the upper sat phase, the single branch trajectory intersects the critical line a c (p) in some point G, whose 
coordinates depend on the initial ratio ao. The point G corresponding to ao = 3.5 is shown in Figure || 

For a finite size N, the critical 2+p-SAT region (also called critical window) around G has a non zero width W 
in terms of the numbers of clauses and variables, much smaller than N, since the transition is sharp [pi. Let us 
call G_ and G+ the lower and upper borders of this windows along the first branch run by DPLL, see bold line on 
Figure [ll| We also denote by iV_ and N + the average numbers of variables not assigned by DPLL at points G_ and 
G+ respectively: N+ < N- < N. G- carries an unsat 2+p-SAT instance. A refutation subtree must be built by 
DPLL before backtracking above G_. The corresponding (sub)tree trajectory, starting from G and penetrating the 
unsat phase up to the arrest line, is shown on Figure ||. 

The size 2 N ~ UJ ~ of the subtree obviously provides a lower bound to the total complexity. Now, once the subtree has 
been entirely explored, DPLL backtracks to some node lying above G_ in the tree (Figures || and [ll]) . The highest 
backtracking node, say Go, is necessarily the deepest one (when starting from above) along the first DPLL branch that 
carries a satisfiable 2+p-instance, and lies below G+. Therefore, a solution must necessarily be found by DPLL below 
G+. The corresponding branch (rightmost path in Figure ^|C) is highly non typical and does not contribute to the 
complexity, since almost all branches in the search tree are described by the tree trajectory issued from G (Figure Q). 
The total size of the search tree is thus bounded from above by 2 W x 2 N - w - and, to exponentially dominant order, 
equivalent to the size of the subtree below G_. 
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B. Analytical calculation of the size of the refutation subtree. 



The coordinates pcoic — ckc(pg) of the crossing point G depend on the initial 3-SAT ratio ao and may be 
computed from the knowledge of the 2+p-SAT critical line ac(p) and the branch trajectory equations jl^). For 
ao = 3.5, we obtain pa — 0.78 and qg = 3.02. Point G is reached by the branch trajectory once a fraction to ~ 0.19 
of variables have been assigned by DPLL. 

Once G is known, we consider the unsatisfiable 2+p-SAT instances with parameters pg,chg as a starting point for 
DPLL. The calculation exposed in Section |y| can be used with initial conditions pg^olq. We show in Table IV the 



results of the analytical calculation of wg, within linear and quadratic approximations for a starting ratio ao = 3.5. 
Note that the discrepancy between both predictions is larger than for higher values of ao- 

The logarithm co of the total complexity is defined through the identity 2 Nu> — 2 N ° UJ ° , or equivalently, 

u=u a (l-t G ) ■ (59) 
The resulting value for ao =3.5 is shown in Table |v|. 

C. Comparison with numerics for ao = 3.5. 



We have checked numerically the scenario of Section VI A in two ways 



First, we have computed during the action of DPLL, the coordinates in the p, a plane of the highest backtracking 
point in the search tree. The agreement with the coordinates of G computed in the previous paragraph is very good 
(Section jIII|). However, the numerical data show large fluctuation and the experimental fits are not very accurate, 
leading to uncertainties on pg and ac of the order of 0.01 and 0.02 respectively. In addition, note that the analytical 



values of the coordinates of G are not exact since the critical line acip) is not rigorously known (Section II C) 



Secondly, we compare in Table IV the experimental measures and theoretical predictions of the complexity starting 
from G. The agreement between all values is quite good and lead to a complexity about log = 0.042 ±0.002. Numerics 
indicate that the annealed value of the complexity is equal (or slightly larger) than the typical value. Therefore the 
annealed calculation developed in Section |y| agrees well the data obtained for 2+p-SAT instances. Once log and tc 
are known, eqn.([59|) gives access to the theoretical value of oj. 



The agreement between theory and experiment is very satisfactory (Table |IV|). Nevertheless, let us stress the 
existence of some uncertainty regarding the values of the highest backtracking point coordinates pg, o-g- Numerical 
simulations on 2+p-SAT instances and analytical checks show that w depends strongly on the initial fraction p . 
Variations of the initial parameter pg around 0.78 by Ap a = 0.01 change the final result for the complexity by 
Ad! = 0.003 — 0.004, twice as large as the statistical uncertainty at fixed po = pc = 0.78. Improving the accuracy of 
the data would require a precise determination of the coordinates of G. 

We show in Figure || the trajectory of the atypical, rightmost branch (ending with a solution) in the tree, obtained 
from simulations for N = 300. It comes as no surprise that this trajectory, which carries a satisfiable and highly 
biased 2+p-SAT instance, may enter the unsat region defined for the unbiased 2+p-SAT distribution. The trajectory 
eventually reaches the a = axis when all clauses are eliminated. Notice that the end point is not S, but the lower 
left corner of the phase diagram. 

As a conclusion, our work shows that, in the ckl < a < ap range, the complexity of solving 3-SAT is related to the 
existence of a critical point of 2+p-SAT. The right part of the 2+p-SAT critical line, comprised between T and the 
threshold point of 3-SAT, can be determined experimentally as the locus of the highest backtracking points in 3-SAT 
solving search trees, when the starting ratio ao spans the interval < ao < otc- 

VII. COMPLEXITY OF 2+P-SAT SOLVING AND RELATIONSHIP WITH STATIC. 

A. Complexity diagram. 

We have analyzed in the previous sections the computational complexity of 3-SAT solving. The analysis may 
extended to any 2+p-SAT instance, with the results shown in Figure |l4|. 
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In addition to the three regimes unveiled in the study of 3-SAT^], a new complexity region appears, on the left 
side of the line a = 1/(1 — p), referred to as "weak derealization" line. To the right (respectively left) of the 
weak derealization line, the second largest eigenvector of the branching matrix K is localized (resp. delocalized), 
see Appendix |a[ When solving a 3-SAT instance, or more generally a 2+p-SAT instance with parameters po,oto < 
1/(1 — Po), the size of the search tree is exponential in N when the weak derealization line is crossed by the tree 
trajectory (Figure Thus, the contribution to the average flow of unit-clauses coming from the second largest 
eigenvector is exponentially damped by the largest eigenvector contribution, and no contradiction arises until the 
halt, strong derealization line is hit. 

If one now desires to solve a 2+p-SAT instance whose representative point Po,cto lies to the left of the weak 
derealization curve, the derealization of the second largest eigenvector stops im mediate ly the growth of the tree, 
before the distribution of 1-clauses could reach equilibrium, see discussion of Section VB4. Therefore, in the range of 
parameters po, ao > 1/(1 — po), proving unsatisfiability does not require an exponentially large computational effort. 



B. Polynomial/exponential crossover and the tricritical point. 

The inset of Figure |IJ show a schematic blow up of the neighborhood of T and Tg, where all complexity regions 
meet. From the above discussion, the complexity of solving critically constrained 2+p-SAT instances is polynomial 
up to ps, and exponential above, in agreement with previous claims based on numerical investigations p3| . In the 
range 2/5 < po < ps, computational complexity exhibits a somewhat unusual behavior as a function of a. The 
peak of hardness is indeed not located at criticality (where the scaling of complexity is only polynomial), but slightly 
below threshold, where complexity is exponential. Unfortunately, the narrowness of the region shown in the Inset of 



Figure |14| seems to rule out the possibility of checking this statement through experiments. 

To end with, let us stress that T, conversely to Tg, depends a priori on the splitting heuristic. Nevertheless, the 
location of T seems to be more insensitive to the choice of the heuristics than branch trajectories. For instance, the 
UC and GUC heuristics both lead to the same tangential hit point T, while the starting ratios of the corresponding 
trajectories, = 8/3 and ~ 3.003, differ. Understanding this relative robustness, and the surprising closeness of 
T and Tg, would be interesting. 



VIII. CONCLUSION AND PERSPECTIVES. 



In this paper we have analyzed the action of a search algorithm, the DPLL procedure, on random 3-SAT instances 
to derive the typical complexity as a function of the size (number of variables) N of the instance and the number a of 
clauses per variables. The easy, polynomial in N, as well as the hard, exponential in N, regimes have been investigated. 
We have measured, through numerical simulations, the size and the structure of the search tree by computing the 
number of nodes, the distribution of branch lengths, and the highest backtracking point. From a theoretical point 
of view, we have analyzed the dynamical evolution of a randomly drawn 3-SAT instance under the action of DPLL. 
The random 3-SAT statistical ensemble, described by a single parameter a, is not stable under the action of DPLL. 
Another variable p, the fraction of length three clauses, has to be considered to account for the later evolution of the 
instance^]. Parameters p and a are the coordinates of the phase diagram of Figure [|. The dynamical evolution of the 
instance is itself of stochastic nature, due to the random choices made by the splitting rule. We can follow the ensemble 
evolution in 'time', that is, the number of variables assigned by DPLL, and represent this evolution by a trajectory in 
the phase diagram of Figure ^. For 3-SAT instances, located on the p — 1 axis, we show that there are three different 
behaviors, depending on the starting ratio a. In the low sat phase, a < ar, — 3.003, trajectories are always confined 
in the sat region of the phase diagram. As a consequence, the search tree reduces essentially to a simple branch, and 
the complexity scales linearly with N. On the opposite, in the unsat phase, the algorithm has to build a complete 



3 Though differential equations (g) depend on t when written in terms of C2, C3, they are Markovian if rewritten in the variables 
p, a. Therefore, the locus of the 2+p-SAT instances points, po, ao, giving rise to trajectories touching the threshold line in T, 
simply coincides with the 3-SAT trajectory starting from ao = Q-l- 

4 This situation is reminiscent of what happens in real-space renormalization, e.g. decimation. New couplings, absent in 
the initial Hamiltonian are generated that must be taken into account. The renormalization flow takes place in the smallest 
coupling space, stable under the decimation procedure, that includes the original Hamiltonian as a point (Leo Kadanoff, private 
communication) . 
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search tree, with all branches ending with a contradiction, to prove unsatisfiability. We have imagined a tree growth 
process that reflects faithfully the DPLL rules for assigning a new literal on a branch, but in which all branches evolve 
in parallel, not in the real backtracking, sequential way. We have derived a partial differential equation describing 
the stochastic growth of the search tree. The tree trajectory plotted on phase diagram of Figure [| represents the 
evolution of the instance parameters p, a for typical, statistically dominant branches in the tree. When the trajectory 
hits the halt line, contradictions prevent the tree from further growing. Computational complexity is, to exponential 
order, equal to the number of typical branches. Last, in the upper sat phase oil < a < a c , the trajectory intersects 
the critical line atcip) in some point G shown in Figure |], and enters the unsat phase of 2+p-SAT instances. Below 
G, a complete refutation subtree has to be built. The full search tree turns out to be a mixture of a single branch 
and some (not exponentially numerous in N) complete subtrees (Figure |l3|) . The exponential contribution to the 
complexity is simply the size of the subtree that can be computed analyzing the growth process starting from G. 

Statistical physics tools can be useful to study the solving complexity of branch and bound algorithms applied 
to hard combinatorial optimization or decision problems. The phase diagram of Figure ^ affords an accurate under- 
standing of the probabilistic complexity of DPLL variants on random instances. This view may reveal the nature 
of the complexity of search algorithms for SAT and related NP-complete problems. In the sat phase, branch trajec- 
tories are related to polynomial time computations while in the unsat region, tree trajectories lead to exponential 
calculations. Depending on the starting point (ratio a of the 3-SAT instance), one or a mixture of these behaviors is 
observed. A recent study of the random vertex cover problem ]3l| has shown that our approach can be successfully 
applied to other decision problems. 

Figure 4 furthermore gives some insights to improve the search algorithm. In the unsat region, trajectories must be 
as horizontal as possible (to minimize their length) but resolution is necessarily exponential psfl . In the sat domain, 
heuristics making trajectories steeper could avoid the critical line ac(p) and solve 3-SAT polynomially up to threshold. 

Fluctuations of complexity are another important issue that would deserve further studies. The numerical experi- 
ments reported in Figure |5| show that the annealed complexity, that is, the average solving time required by DPLL, 
agrees well with the typical complexity in the unsat phase but discrepancies appear in the upper sat phase. It comes as 
no surprise that our analytical framework, designed to calculate the annealed complexity, provides accurate results in 
the unsat regime. We were also able to get rid of the fluctuations, and to calculate the typical complexity in the upper 
sat phase of 3-SAT from the annealed complexity of critical 2+p-SAT (Table |lV|) . This suggests that fluctuations may 



originate from atypical points G in the mixed structure of the search tree unveiled in Section VI . Such atypical points 
G, coming from the 1/V~N finite size fluctuations of the branch trajectory^, lead to exponentially large fluctuations 
of the complexity (Figure |5|B). 

It would be rewarding to achieve a better theoretical understanding of such fluctuations, and especially of fluctu- 
ations of solving times from run to run of DPLL procedure on a single instance . Practitioners of hard problem 
solving have reached empirical evidence that exploiting in a cunning way the tails of the complexity distribution may 
allow a drastic improvement of performances J33[ . Suppose you are given one hour CPU time to solve one instance 
which, in 99% of cases, require 10 hours of calculation, and with probability 1%, ten seconds only. Then, you could 
proceed by running the algorithm for eleven seconds, stop it if the instance has not been solved yet, start again 
hundreds of time if necessary till the completion of the task. Investigating whether such a procedure could be used 
to treat successfully huge 3-SAT instances would be very interesting. 
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APPENDIX A: LARGEST EIGENVALUES AND EIGENVECTORS OF THE EFFECTIVE BRANCHING 

MATRIX. 

In this appendix, the largest eigenvalue of the effective branching matrix (p3|) is computed. We start by multiplying 
both sides of the eigenvalue equation, obtained by applying the matrix K q2 onto the eigenvector v q (C[), by x . 



^Fluctuations also come from the finite width W ~ iV 1 ~ 1/V of the critical 2+p-SAT line. Recently derived lower bounds on 
the critical exponent v (> 2 J32j) reveal that finite size effects could be larger than 0(1/ \fN); for 2-SAT indeed, v — 3 and 
relative fluctuations scale as W/N ~ N~ 2 ^ 3 
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Then, we sum over C\ and obtain the following equation for the eigenvectors generating functions V q (x), 



where 



and 



N(x) = e~ V2 (x + x 2 ) - 1 , L(x) = - expf— e - * 8 ^ , (A2) 

x V 2 / 



Aq = Xq,q 2 exp |to 2 I 1 — J j , (A3) 

with ?/2 = — i<?2- From Section [v|, qi is purely imaginary at saddle-point, and it is therefore convenient to manipulate 
the real- valued number y^. 

1. Zeroes and poles of V q . 

N(x) has two zeroes x^ < < x* that are functions of yi solely and given by 

x) = - (-1 - v/1 + 4e Va ) 

x* = - (-1 + VI + 4e^) . (A4) 
N(x) is negative when its argument x lies between the zeroes, positive otherwise. The function L(x) is plotted 



Figure 15. The positive local minimum of L(x) is located at x m = 2 e*/m2, L m = nri2 e 1 V2 /2. The number of poles 



of (x) can be inferred from Figure |15|. 

• If A g < 0, there is a single negative pole. 

• If < A q < L m , there is no pole. 

• If A q > L m , there are two positive poles X-,x + with a;_ < x rn < x + that coalesce when A q = L m . 

2. Largest eigenvalue and eigenvector. 

Consider the largest eigenvector vq(Ci) and the associated eigenvalue Aq > 0. The ratios 

H{Cl) = ^oo T^JT A5) 

define the probability that the number of unit-clauses be equal to C\ at a certain stage of the search. Consequently, as 
long as no contradiction occurs, we expect all the ratios to be positive and decrease quickly with C\. The generating 
function V q {x) must have a finite radius of convergence R, and be positive in the range < x < R. Note that the 
radius of convergence coincides with a pole of V q . 

The asymptotic behavior of vq(Ci) is simply given by the radius of convergence, 



vo(Ci) , (A6) 

up to non-exponential terms. New branches are all the more frequent that many splittings are necessary and unit- 



clauses are not numerous, i.e. vq(Ci) decreases as fast as possible. From the discussion of Section Al, the radius of 
convergence is maximal if R = x + . To avoid the presence of another pole to V q {x) in x_, the zero of the numerator 
function N(x) must fulfills x* = We check that V g (x) is positive for < x < R. The corresponding eigenvalue is 
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Ao = L(x + ), see eqn. (pJ5[). In next Section, we explain that this theoretical value of Ao has to be modified in a small 
region of the (7/2, W2) plane. 

The eigenvector vq undergoes a derealization transition on the critical line x+(y2, 7772) = 1 or, equivalently, L(x*) = 
L(l). At fixed 2/2) the eigenvector is localized provided that parameter 7772 is smaller than a critical value 

■ (A7) 

The corresponding curve is shown in Figure [l~6| . 



3. Excited state and spectral gap. 

As Kq 2 is not a symmetric matrix, complex eigenvalues may be found in the spectrum. We have performed numerical 
investigations by diagonalizing upper left corners of K q2 of larger and larger sizes. From a technical point of view, we 

define the U x U matrix K^\C\, C[) — K q2 {C\, C[) for < Ci, C{ < U. Numerics show that complex eigenvalues 
are of small modulus with respect to real-valued eigenvalues. 

If 7/2 < hi 2, the largest eigenvalue A^ of Kqf converges very quickly to the theoretical value L(x + ) as U increases 
(with more than five correct digits when U > 30). At small values of 7712, the second largest eigenvalue (in absolute 
value) is negative. Let us call it Af. The associated eigenvector 7>f(Ci) is localized ; all components of v-\ have the 



same sign except v-\ (0). The value of Af may be computed along the lines of Section A 2 



A t = L(a: t ) . (A8) 

Indeed, x^ < implies from Figure ||that L{x<) < 0. As 777.2 increases (at fixed 7/2), A-f becomes smaller (in modulus) 
than the second largest positive eigenvalue Ai. Ai is followed by a set of positive eigenvalues A 2 > A 3 > . . .. Successive 
A q (q > 1) gets closer and closer as U increases, to form a continuous spectrum in the U — ► 00 limit. 

The eigenvectors v q (q = 1,2,3,.. .) have real-valued components of periodically changing signs. The corresponding 
generating functions have therefore complex- valued radii of convergences, and V q (x) does not diverge for real arguments 
x. Consequently, the edge of the continuous spectrum Ai is given by 

Ai = L m . (A9) 

The above theoretical prediction is in excellent agreement with the large U extrapolation of the numerical values of 



Ai. Repeating the discussion of Section A 2, the first excited state v\ becomes delocalized when x m = 1, that is, when 
777.2 exceeds the value 



loci 



(2/2) = 2 ^ . (A10) 



The corresponding curve is shown in Figure [T^. 

The gap between Ao and Ai will be strictly positive as long as x* < x rn . This defines another upper value for 7772, 

m^ {y2)=2e y 2 _L^ (A11) 

beyond which the largest eigenvalue coincides with the top Ai of the continuous spectrum. As can be seen from 
Figure [l6[ Ao coincides with L(x + ) in the region 7772 < m 2 oc0 as long as the largest eigenvalue is separated from the 
continuous spectrum by a finite gap. The refore, in the region (7/2 > hi 2, m 9 ^ Lp < 7772 < 777 2 oc0 ), the largest eigenvalue 



merges with Ai, and is given by eqn. (AE) 



When 7772 crosses the critical line 777 2 ocU , the largest eigenvector gets delocalized and the average number of unit- 
clauses flows to infinity. As a result of the avalanche of unitary clauses, contradictions necessarily occur and the growth 
of the tree stops. Notice that, as far as the total number of branches is concerned, we shall be mostly concerned by 
the 7/ 2 = axis. The critical values of interest are in this case: 777 2 ocl = 2, 7T7 2 oc0 = (3 + ln[(l + y/E)/2] ~ 2.5197 
and mf p = 1 + VE ~ 3.2361. 
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APPENDIX B: QUADRATIC APPROXIMATION FOR THE GROWTH PARTIAL DIFFERENTIAL 

EQUATION. 



The partial differential equation for the growth of the search tree may be written in terms of the Legendre transform 
tp(l/2, i/3, t) of the logarithm of the number of branches as 



9<P f , s 2 dip 3 dip 

-^-(2/2,2/3,*) = Si (2/2) - Y^t 92 ^dy^ ~ T~t 9 ^ V2,m '~dy~ 3 



(Bl) 



with 



51(2/2) 

52(2/2) = 1 - 



2/2 + In 
1 



1 



7; (l + v / l + 4e 2 w) 



1 



273(2/2,2/3) = l- -e-» 3 (l + e» 2 ) 



(B2) 



1. Linear approximation. 



At the first order in 1/2, 2/3, we replace the functions g appearing in (Bl) with their linearized counterparts, 

II), 



(i), v 3- V5 , /5 + 3V5\ 
.92 fe,)- — ^—!* 

53 1 ' (2/2, 2/3) = 2/3 - ^2/2 , (B3) 

and solve the corresponding partial differential equation. The solution, called ip^ (2/2, 2/3, t) is given in equations (|5 
and (§3). 

2. Quadratic approximation. 

At the second order in 2/2, 2/3, we consider the quadratic corrections to the g functions, 

(2)/ \ V§, , 2 
5i (2/2) = -jfiiVV 



5^(2/2,2/3) = -\(vz? + \vzV2 -\{y2? , (B4) 

and look for a solution of ( |Bl| ) of the form 99 = + <^( 2 ). Neglecting higher order terms in g^p^ 2 \ we end with 
-^-(2/2,2/3,*) =G w (y 2> y3) ~ Y~t 92 ^~dyV ~ 1^ ^■ V2,V ^~dy 3 ~ ' ^ ^ 

where 

^(2/2,2/3) = *?>(«,) - JL^Jfo)^ ~ i^^'^^T . (B6) 
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2 r > 



A particular solution of (B5) may be found under the form 



Vpor*.(z2,*3,*) ^ o-oit) + a 2i(t)y2 + a 2 2(t)(y 2 ) 2 + a 33 (t)(y 3 ) 2 , 

where the a's are linear combinations of 1 — t, (1 — t) 3 and (1 — i) M with fj, = (5 + 3v5)/10. 
The general solution of the homogeneous version of (|B5|) reads 



(B7) 



¥>2L>2,*3,*) = * 



(i-<) 3 h/3 



75 + 9^ 15-4^5 



116 



"2/2 



58 



;(i-*y 



2/2 



7^5- 15 



(B8) 



where $ is a differentiable function of two arguments u, v. Assuming that is a quadratic form in u and v, we 

fix its six coefficients through the initial condition (at time t = 0), 



<*2 (2) (2/2, 2/3,0) = ^pJ rt XV2, 2/3, 0) +Vhom. (2/2, 2/3,0) = 



(2) 



(B9) 



The resulting expression for ip^ is too long to be given here but can be obtained safely by using an algebraic 
computation software, e.g. Mathematica. 
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TABLES 



Ratio a clause/var 


Nb. of var. 


Resolution time 


20 


900 


6 seconds 


10 


700 


1 hour 


7 


400 


20 minutes 


4.3 


350 


2 days 


3.5 


500 


20 minutes 



TABLE I. Typical computational time required to solve some hard random 3-SAT instances for diffe rent rat ios a and sizes 
N. Slightly below threshold, some rare instances may require much longer computational time (Section HID 1). 



Ratio a 


Experiments 


Theory 


of clause/var. 


nodes 


histogram 


lin. 


quad. 


20 


0.0153 ±0.0002 


0.0151 ±0.0001 


0.0152 


0.0152 


15 


0.0207 ±0.0002 


0.0206 ±0.0001 


0.0206 


0.0206 


10 


0.0320 ±0.0005 


0.0317 ±0.0002 


0.0319 


0.0319 


7 


0.0481 ± 0.0005 


0.0477 ±0.0005 


0.0477 


0.0476 


4.3 


0.089 ±0.001 


0.0895 ±0.001 


0.0875 


0.0852 



TABLE II. Logarithm of the complexity u> from experiments and theory in the unsat phase. Values of to from measures 
of search tree sizes (nodes), histograms of branch lengths (histogram) and theory within linear (lin.) and quadratic (quad.) 
approximations. Note the excellent agreement between theory and experiments for large a. 



Ratio ao 


Slope 


Curvature (3 


of clause/var. 


7 


nodes 


histogram 


15 


-1.47 


69.6 


75.6 


10 


-1.32 


56.5 


47.8 


7 


-1.06 


39.4 


29.6 


4.3 


-0.58 


20.2 


13.6 



TABLE III. Details on the hts of search tree sizes from equation (y). For different ratios a, the slope 7 of the fit of 
lon — log 2 N/N vs. 1/N is shown as well as the corresponding curvature f3 deduced from (ph (column "nodes"). The curvatures 
measured directly at the top of the branch lengths histograms are listed in the "histogram" column. 



Para 
P 


meters 
a 


, ,ann 


Experiments 


typ 
his. 


The 
lin. 


:ory 
quad. 


1 

0.78 


3.5 
3.02 


0.043 ±0.002 
0.044 ±0.002 


0.035 ±0.003 
0.041 ±0.002 


0.041 ±0.002 


0.0355 
0.0440 


0.0329 
0.0407 



TABLE IV. Logarithm of the complexity ui from experiments and theory in the upper sat phase. Experiments determine u) 
from measures of the annealed complexity (ann.), of the typical search tree sizes (nod.) and of histograms of branch lengths 
(his.). Data are presented for 3-SAT instances with ratio a = 3.5, and 2+p-SAT instances with parameters p — 0.78, a — 3.02. 
Theoretical predictions within linear (lin.) and quadratic (quad.) approximations are reported for the 2+p-SAT model, and 
for 3-SAT using eqn.@. 
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2 4 6 8 10 



«c 

number of clauses per variable a 



FIG. 1. Solving complexity of random 3-SAT as a function of the ratio a of clauses per variables, and for three increasing 
problem sizes N. Data are averaged over 10,000 randomly drawn samples. Complexity is maximal at the threshold etc — 4.3. 




FIG. 2. Examples of search trees. A. simple branch: the algorithm finds easily a solution without ever (or with a negligible 
amount of) backtracking. B. dense tree: in the absence of solution, the algorithm builds a "bushy" tree, with many branches 
of various lengths, before stopping. C. mixed case, branch + tree: if many contradictions arise before reaching a solution, the 
resulting search tree can be decomposed in a single branch followed by a dense tree. The junction G is the highest backtracking 
node reached back by DPLL. 
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step 



clauses 



search tree 







7 



w v x v y 

W V X V z 

w v x v y 

w v x v y 

x v y v z 



split : w = T /* 



X V z 

x v y 

x v y 

x v y v z 



split : x = T 



contradiction 
6 backtracking to stage 3 : x = F 



z 

yv z 



8 propagation : z = T, y = T 



5 propagation : y = F, y = T ^g 7 * 



solution : w = T, x = F, y = T, z = T C S 



FIG. 3. Example of 3-SAT instance and Davis-Putnam-Loveland-Logemann resolution. Step 0. The instance consists of 
M = 5 clauses involving N = 4 variables that can be assigned to true (T) or false (F). w means (NOT w) and v denotes 
the logical OR. The search tree is empty. 1. DPLL randomly selects a variable among the shortest clauses and assigns it to 
satisfy the clause it belongs to, e.g. w =T (splitting with the Generalized Unit Clause -GUC- heuristic). A node and an edge 
symbolizing respectively the variable chosen (w) and its value (T) are added to the tree. 2. The logical implications of the last 
choice are extracted: clauses containing w are satisfied and eliminated, clauses including w are simplified, and the remaining 
clauses are left unchanged. If no unitary clause (i.e. with a single variable) is present, a new choice of variable has to be 
made. 3. Splitting takes over. Another node and another edge are added to the tree. 4. Same as step 2 but now unitary 
clauses are present. The variables they contain have to be fixed accordingly (propagation). 5. The propagation of the unitary 
clauses results in a contradiction. The current branch dies out and gets marked with C. 6. DPLL backtracks to the last split 
variable (x), inverts it (F) and creates a new edge. 7. Same as step 4. 8. Propagation of the unitary clauses eliminates all the 
clauses. A solution S is found. This example show how DPLL find a solution for a satisfiable instance. For an unsatisfiablc 
instance, unsatisfiability is proven when backtracking (see step 6) is not possible anymore since all split variables have already 
been inverted. In this case, all the nodes in the final search tree have two descendent edges and all branches terminate by a 
contradiction C. 



30 




FIG. 4. Phase diagram of the 2+p-SAT model and trajectories generated by DPLL. The threshold line ac{p) (bold full 
line) separates sat (lower part of the plane) from unsat (upper part) phases. Extremities lie on the vertical 2-SAT (left) and 
3-SAT (right) axis at coordinates (p = 0, ac = 1) and (p = l,Qc — 4.3) respectively. The threshold line coincides with the 
a = 1/(1 — p) curve (dotted line) when p < ps — 0.41, that is, up to the tricritical point Tg. Departure points for DPLL 
trajectories are located on the 3-SAT vertical axis and the corresponding values of ao are explicitly given. Dashed curves 
represent tree trajectories in the unsat region (thick lines, black arrows) and branch trajectories in the sat phase (thin lines, 
empty arrows). Arrows indicate the direction of motion along trajectories parametrized by the fraction t of variables set by 
DPLL. For small ratios, e.g. Qo = 2 < q_l — 3.003, trajectories remain confined in the sat phase. At qi, the single branch 
trajectory hits tangentially the threshold line in T of coordinates (2/5,5/3), very close to Ts- In the intermediate range 
oil < Oio < ac, the branch trajectory intersects the threshold line at some point G that depends on a. A dense tree then grows 
in the unsat phase, as happens when 3-SAT departure ratios are above threshold a > ac — 4.3. The tree trajectory halts on 
the dot-dashed curve a ~ 1.259/(1 — p) where the tree growth process stops. At this point, DPLL has reached back the highest 
backtracking node in the search tree, that is, the first node when Qo > ac, or node G for a^ < «o < ac. In the latter case, a 
solution can be reached from a new descending branch (rightmost path in Figure |^C) while, in the former case, unsatisfiability 
is proven. Small squares show the trajectory corresponding to this successful branch for a — 3.5, as obtained from simulations 
for TV = 300. The trajectory coincides perfectly with the theoretical branch trajectory up to point G (not shown), and then 
reaches the a = axis when a solution is found. 



31 



120 



100 



0.04 0.06 0.08 0.1 0.12 

Logarithm of complexity (divided by size N) 



0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 

Logarithm of complexity (divided by size N) 



FIG. 5. Distribution of the logarithms (in base 2, and divided by N) of the complexity for four different sizes N = 30 
(dot-dashed), 50 (dashed), 100 (dotted) and 200 (full line). The histograms are normalized to unity and represent 50,000 
instances. The distribution gets more and more concentrated as the size grows. A. Ratio a = 10. Curves are roughly 
symmetrical around their mean with small tails on their flanks; large fluctuations from sample to sample are absent. B. Ratio 
a — 3.1. Curves have large tails on the right, reflecting the presence of rare, very hard samples. 
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FIG. 6. Complexity of solving in the sat region for a < oil — 3.003, divided by the size N of the instances. Numerical 
data are for sizes N =50 (cross), 75 (square), 100 (diamond), 500 (triangle) and 1000 (circle). For the two biggest sizes, 
simulations have been carried out for ratios larger than 2.5 only. Data for different N collapse onto the same c urve, p roving 
that complexity scales linearly with N. The bold continuous curve is the analytical prediction 7(a) from Section 1VC2. Note 
the perfect agreement with numerics except at large ratios where finite size effects are important, due to the cross-over to the 
exponential regime above oll — 3.003. 



32 




FIG. 7. Logarithm of the number of branches in unsat search trees as a function of the branch length. Main figure: the size 
of the search tree is a decreasing function of a at fixed N. Histograms are presented here for ratios equal to a = 4.3 (solid 
line, N = 200), a = 7 (dotted line, N = 400), a = 10 (dashed line, N = 600) and a = 20 (dotted-dashed line, N = 900) and 
have been averaged over hundreds of samples. Inset: the heights of the tops of the histograms show a very weak dependence 
upon N. Numerical extrapolations of wzv to N — > oo and statistical errors are reported Table |lj|. Sat instances (which may be 
present for small sizes at a = 4.3) have not be considered in the averaging procedure. For each inset, we give below the sizes 
N followed by the number of instances in parenthesis used for averaging. Ratio a = 4.3: solid line: 100 (5000), dotted line: 
150 (500), dashed line: 200 (400); a = 7: solid line: 200 (10000), dotted line: 300 (1000), dashed line: 400 (200). a = 10: solid 
line: 400 (500), dotted line: 500 (400), dashed line: 600 (100); a = 20: solid line: 700 (600), dotted line: 800 (1000), dashed 
line: 900 (1000). 
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FIG. 8. Solving complexity in the upper sat phase for sizes -/V ranging from N = 100 to N = 400. The size of the symbols 
accounts for the largest statistical error bar. A. 3-SAT problem with ratio a — 3.5: typical (average of the logarithm, full 
circles) and annealed (logarithm of the average, full triangles) size of the search tree. Dotted lines are quadratic and linear fit of 
the typical and annealed complexities, giving u 3 yp = 0.035 ± 0.03 and u>^ nn = 0.043 ± 0.02 in the infinite size limit. B. Related 
2+p-SAT problem with parameters pc = 0.78, etc = 3.02. The typical complexity is measured from the size of the search tree 
(circles) and the top of the branch length distribution (triangle) , with the same large N extrapolation: <J^ p = 0.041 ± 0.02. 
This value is slightly smaller than the annealed complexity wf+p = 0.044 ± 0.02. All fits are linear. 
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FIG. 9. Coordinates of the highest backtracking point G in the search tree with a starting ratio a — 3.5 (the upper sat 
phase) for different sizes N = 100, . . . , 500 and averaged over 10,000 (small sizes) to 128 (N = 500) instances. The fits shown 
are quadratic functions of the plotting coordinate log 2 N/N, and give the extrapolated location of G in the large size limit: 
Pg = 0.78 ±0.01, qg = 3.02 ±0.02. Note the uncertainty on these values due to the few number of instances available at large 
instance sizes. 
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1 -clauses 

FIG. 10. Schematic view of the dynamics of clauses. Clauses are sorted into three recipients according to their lengths, i.e. 
the number of variables they include. Each time a variable is assigned by DPLL, clauses are modified, resulting in a dynamics 
of the recipients populations (lines with arrows). Dashed lines indicate the elimination of (satisfied) clauses of lengths 1, 2 or 
3. Bold lines represent the reduction of 3-clauses into 2-clauses, or 2-clauses into 1-clauses. The flows of clauses are denoted 
by ei,e2,e3 and W2,wi respectively. A solution is found when all recipients are empty. The level of the rightmost recipient 
coincides with the number of unitary clauses. If this level is low (i.e. 0(1)), the probability that two contradictory clauses x 
and x are present in the recipient is vanishingly small. When the level is high (i.e. O(vJV)), contradictions will occur with a 
large probability. 
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FIG. 11. Schematic representation of the search tree building up in the toy dynamical process. A branch carrying a formula 
with C' 3 3-clauses splits once a variable is assigned, and gives birth to two branches having G3 and G3 3-clauses respectively. 
The variable is assigned randomly, independently of the 3-clauses. Clauses of lengths one or two are not considered. 
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FIG. 12. Logarithm u) of the number of branches (base 2, a nd d ivided by N) as a function of the number C3 of 3-clauses 

At time t — 0, the search tree is empty and the ratio of 



VA 



at different times t for the simplified growth process of Section 
clauses per variable equals an = 5. Later on, the tree grows and a whole distribution of branches appears. Dominant branches 
correspond to the top of the distributions of coordinates £3(4), i2i(t). Branches become exponentially more numerous with time, 
while they carry less and less 3-clauses. 
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FIG. 13. Detailed structure of the search tree in the upper sat phase («l < a < ac)- DPLL starts with a satisfiable 3-SAT 
instance and transforms it into a sequence of 2+p-SAT instances. The bold, leftmost branch in the tree symbolizes the first 
descent made by DPLL. Above node G+, instances are almost surely satisfiable while below G_, instances have no solutions. 
The size of the critical window, that is, the number of variables to assign to reach G_ from G+, is W <C N. Go denotes the 
highest node in the tree carrying a satisfiable 2+p-SAT instance. A grey triangle accounts for (exponentially) large refutation 
subtree that DPLL has to go through before backtracking above G_. By definition, the highest node reached back by DPLL 
is Go- Further backtracking, below Go, might be necessary but a solution will be eventually found along the rightmost branch 
issued from Go. 
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FIG. 14. Four different regions coexist in the phase diagram of the 2+p-SAT model, according to whether complexity is 
polynomial or exponential, and formulae are sat or unsat. Borderlines are (from top to down): a — 1/(1 — p) (dotted line), 
ac{p) (full line), and the branch trajectory (dashed line), starting in (1, ol) and ending at point T tangentially to the threshold 
line. The tricritical point Ts, with coordinates ps — 0.41, as = 1/(1 — ps), separates second from first order critical point on 
the threshold line, and lies very close to T. Inset: schematic blow-up of the T, Ts region (same symbols as in the main picture). 
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FIG. 15. Sketch of the function L(x) appearing in the denominator of the eigenvector generating function V q (x). L(x) is posi- 
tive (resp. negative) for positive (resp. negative) arguments x. The local positive minimum is located at x m — I/72, L m — e^2- 
The height of the minimum, L m , is equal to the edge Ai of the (excited states) continuous spectrum. For A > L m , the equation 
L(x) — A has two roots x~,x+ such that X- < x m < x+. When X- coincides with the positive zero x* of the numerator N(x), 
the maximal eigenvalue Ao is obtained (Appendix IaJ) . 
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FIG. 16. Critical curves of m 2 = 2c2/(l — t) as a function of the parameter y 2 . From bottom to top (left side): derealization 
threshold m 2 oc l for the second largest eigenvector (dotted line), derealization threshold m 2 oc '° for the largest eigenvector (full 
line), and zero gap curve mf p (dashed line). All curve meet in y 2 = In2,ra2 = 4. For -y 2 < ln2 and small m,2, the largest 
eigenvector is separated from a continuum of excited states by a finite gap, and undergoes a derealization condition when 
m,2 reaches m 2 oc '°. For yi > ln2, the largest eigenvector merges with the continuum spectrum when ra 2 > m 3 2 ap ', and gets 
delocalized on the critical line m 2 ocl . 
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